0% found this document useful (0 votes)
43 views324 pages

Underwood R. Cryptography For Secure Encryption 2022

The document is a preface and overview of the book 'Cryptography for Secure Encryption' by Robert G. Underwood, which explores the mathematical foundations and components of cryptography. It is designed for use in graduate and senior undergraduate courses in mathematics, computer science, or cyber security, covering topics such as symmetric key cryptosystems, public key cryptography, digital signatures, and elliptic curves. The book includes exercises, programming problems, and course outlines, while acknowledging that it does not provide a comprehensive history of cryptography or cover quantum cryptography.

Uploaded by

Franke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views324 pages

Underwood R. Cryptography For Secure Encryption 2022

The document is a preface and overview of the book 'Cryptography for Secure Encryption' by Robert G. Underwood, which explores the mathematical foundations and components of cryptography. It is designed for use in graduate and senior undergraduate courses in mathematics, computer science, or cyber security, covering topics such as symmetric key cryptosystems, public key cryptography, digital signatures, and elliptic curves. The book includes exercises, programming problems, and course outlines, while acknowledging that it does not provide a comprehensive history of cryptography or cover quantum cryptography.

Uploaded by

Franke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 324

Universitext

Robert G. Underwood

Cryptography
for Secure
Encryption
Universitext

Series Editors
Carles Casacuberta, Universitat de Barcelona, Barcelona, Spain
John Greenlees, University of Warwick, Coventry, UK
Angus MacIntyre, Queen Mary University of London, London, UK
Claude Sabbah, École Polytechnique, CNRS, Université Paris-Saclay, Palaiseau,
France
Endre Süli, University of Oxford, Oxford, UK
Universitext is a series of textbooks that presents material from a wide variety of
mathematical disciplines at master’s level and beyond. The books, often well class-
tested by their author, may have an informal, personal even experimental approach
to their subject matter. Some of the most successful and established books in the
series have evolved through several editions, always following the evolution of
teaching curricula, into very polished texts.
Thus as research topics trickle down into graduate-level teaching, first textbooks
written for new, cutting-edge courses may make their way into Universitext.
Robert G. Underwood

Cryptography for Secure


Encryption
Robert G. Underwood
Auburn University at Montgomery
Montgomery, AL, USA

ISSN 0172-5939 ISSN 2191-6675 (electronic)


Universitext
ISBN 978-3-030-97901-0 ISBN 978-3-030-97902-7 (eBook)
https://doi.org/10.1007/978-3-030-97902-7

Mathematics Subject Classification: 94A60, 68P25, 20K01, 11T71, 14G50

© Springer Nature Switzerland AG 2022


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Cryptography for Secure Encryption is a book about how concepts in mathematics


are used to develop the basic components of cryptography. At times, the mathe-
matical foundations are as important as the cryptography. The text assumes basic
knowledge of calculus and linear algebra.
The book can be used for a course in cryptography in a master’s degree program
in mathematics, computer science, or cyber security. It can also be used in a senior-
level undergraduate course in cryptography for mathematics or computer science
majors.
In the first part of the book (Chapters 2–7), we introduce concepts from
probability theory, information theory, complexity theory, modern algebra, and
number theory that will be used later in our study of cryptography.
In the second part of the book (Chapters 8–14), we develop the basic notions of
cryptography.
We outline the contents of the second part of the book in detail. In Chapter 8, we
present the major symmetric key cryptosystems, including the simple substitution
cryptosystem, the affine cipher, the Hill 2×2 cipher, the Vigenère cipher, the Vernam
cipher, and the stream cipher. We discuss Feistel-type block ciphers such as DES and
AES. Methods of cryptanalysis such as frequency analysis and the Kasiski method
are also given.
In Chapter 9, we introduce public key cryptography, including the two most
important public key cryptosystems: RSA and ElGamal. We discuss standard attacks
on these cryptosystems.
In Chapter 10, we discuss digital signature schemes and hash functions.
In Chapter 11, we consider the construction of bit generators for use in stream
ciphers. We give a practical definition of “randomness” of a sequence of bits
(the next-bit test) and show that under the discrete logarithm assumption, the
Blum-Micali bit generator is pseudorandom. Moreover, under the quadratic residue
assumption, we prove that the Blum-Blum-Shub bit generator is pseudorandom.
In Chapter 12, we address the problem of distribution of keys. We introduce the
Diffie-Hellman key exchange protocol (DHKEP) and discuss standard attacks on the
DHKEP including the man-in-the-middle attack, baby-step/giant-step, and the index

v
vi Preface

calculus. Index calculus is an attack tailored to the choice of group G = U (Zp ), p


prime, in the DHKEP.
In Chapter 13, we introduce elliptic curves and the elliptic curve group. An
elliptic curve over a field K is the set of points satisfying an equation of the form

y 2 = x 3 + ax + b,

where the curve is smooth, that is, the cubic has non-zero elliptic discriminant. The
set of points on an elliptic curve, together with the point at infinity, is endowed with
a binary operation (point addition) to yield the elliptic curve group E(K).
A cyclic subgroup of E(K) is used in the Diffie-Hellman key exchange protocol
in place of U (Zp ) to define the elliptic curve key exchange protocol (ECKEP). The
ECKEP is more secure than the ordinary Diffie-Hellman protocol since the index
calculus attack cannot be applied to an elliptic curve group.
In Chapter 14, we consider the case where the curve y 2 = x 3 + ax + b is not
smooth. We explore the connection between the group of points on such curves (the
non-singular points Ens (K)) and another group of points Gc (K), which generalizes
the circle group. This is an active area of research which may provide insight into
the nature of point addition in the smooth case.
Each chapter contains a set of exercises of various degrees of difficulty that help
summarize and review the main ideas of the chapter.
I have found that cryptographic algorithms and protocols provide for good
programming problems. At various places in the text, I have included GAP code.
GAP is a powerful computer algebra system, distributed freely, and available at
gap-system.org

Course Outlines

Cryptography for Secure Encryption contains more material than can be covered in
a one-semester course. Instructors can choose topics that reflect their requirements
and interests, and the level of preparation of their students.
For a one-semester course in cryptography for undergraduates, a suggested
course outline is:
Chapter 1: Sections 1.1, 1.2, 1.3.
Chapter 2: Sections 2.1, 2.2, 2.3, 2.4.
Chapter 3: Sections 3.1, 3.2, 3.4.
Chapter 4: Sections 4.1, 4.2, 4.3, 4.4.
Chapter 5: Sections 5.1, 5.2, 5.3.
Chapter 6: Sections 6.1, 6.2, 6.3, 6.4.
Chapter 7: Section 7.4.
Chapter 8: Sections 8.1, 8.2, 8.3, 8.4, 8.5.
Chapter 9: Sections 9.1, 9.2, 9.3, 9.4, 9.6.
Preface vii

For a one-semester graduate level course, a suggested course outline is:


Chapters 1–4 as above, but assigned as reading or light coverage in class.
Chapter 5: Sections 5.1, 5.2, 5.3.
Chapter 6: Sections 6.1, 6.2, 6.3, 6.4.
Chapter 7: Section 7.4.
Chapter 8: Sections 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7.
Chapter 9: Sections 9.1, 9.2, 9.3, 9.4, 9.5, 9.6.
Chapter 10: Sections 10.1, 10.2.
Chapter 11: Sections 11.1, 11.2, 11.3, 11.4.
Chapter 12: Sections 12.1, 12.2.
Chapter 13: Sections 13.1, 13.2, 13.4, if time permits.

What’s Not in the Book

This book is not meant to be a comprehensive text in cryptography. There are certain
topics covered very briefly or omitted entirely. For instance, there is no significant
history of cryptography in the text. For a well-written, concise account of some
history, see [33, Chapter 1, Section 1].
Moreover, I did not include any quantum cryptography. For a discussion of the
quantum key distribution protocol, see [37, Chapter 4, Section 4.4.5].

Acknowledgments

This book has its origins in notes for a one-semester course in cryptography, which
is part of a master’s degree program in cyber security at Auburn University at
Montgomery in Montgomery, Alabama.
I would like to thank current and former colleagues for their support and
encouragement during the writing of this book: Dr. Babak Rahbarinia, (now at
Salesforce), who read an early draft, Dr. Luis Cueva-Parra (now at the University of
North Georgia), Dr. Yi Wang, Dr. Lei Wu, Dr. Matthew Ragland, Dr. Semih Dinc,
Dr. Patrick Pape (now at the University of Alabama in Huntsville), Dr. Enoch Lee,
and the many cryptography students at AUM who have used the course notes.
I also want to thank Elizabeth Loew at Springer for her interest in this project
and her encouragement and guidance in helping to develop the manuscript.
I thank the reviewers for their helpful comments and suggestions, which have
improved the material in this book. I am indebted to readers of the first draft who
kindly agreed to read a second draft and provided many useful comments on the
manuscript.
viii Preface

I thank my wife Rebecca and my son Andre for accepting the many hours that
have been devoted to this book. I acknowledge that the musical works of the Grateful
Dead, along with the literary efforts of Richard Brautigan, have been an influence
and inspiration.

Montgomery, AL, USA Robert G. Underwood


Contents

1 Introduction to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Players in the Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Ciphertext Only Attack: An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Abstract Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Collision Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 2-Dimensional Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Bernoulli’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Information Theory and Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Entropy and Randomness: Jensen’s Inequality. . . . . . . . . . . . 31
3.2 Entropy of Plaintext English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 ASCII Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Joint and Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Unicity Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Introduction to Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Basics of Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Polynomial Time Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Non-polynomial Time Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Complexity Classes P, PP, BPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Probabilistic Polynomial Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

ix
x Contents

4.5 Probabilistic Algorithms for Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Algebraic Foundations: Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Introduction to Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Examples of Infinite Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Examples of Finite Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.1 The Symmetric Group on n Letters . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.2 The Group of Residues Modulo n . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Direct Product of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6 Homomorphisms of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7 Group Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.7.1 Some Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Algebraic Foundations: Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1 Introduction to Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1.1 Polynomials in F [x] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 The Group of Units of Zn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.1 A Formula for Euler’s Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 U (Zp ) Is Cyclic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Exponentiation in Zn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4.1 Quadratic Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7 Advanced Topics in Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1 Quotient Rings and Ring Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.1 Quotient Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.2 Ring Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Simple Algebraic Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2.1 Algebraic Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4 Invertible Matrices over Zpq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8 Symmetric Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.1 Simple Substitution Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.1.1 Unicity Distance of the Simple Substitution
Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.2 The Affine Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2.1 Unicity Distance of the Affine Cipher . . . . . . . . . . . . . . . . . . . . . 145
8.3 The Hill 2 × 2 Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.3.1 Unicity Distance of the Hill 2 × 2 Cipher . . . . . . . . . . . . . . . . . 148
8.4 Cryptanalysis of the Simple Substitution Cryptosystem . . . . . . . . . . . 149
8.5 Polyalphabetic Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.5.1 The Vigenère Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Contents xi

8.5.2 Unicity Distance of the Vigenère Cipher . . . . . . . . . . . . . . . . . . 157


8.5.3 Cryptanalysis of the Vigenère Cipher . . . . . . . . . . . . . . . . . . . . . 157
8.5.4 The Vernam Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.5.5 Unicity Distance of the Vernam Cipher . . . . . . . . . . . . . . . . . . . 162
8.6 Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.7 Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.7.1 Iterated Block Ciphers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9 Public Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.1 Introduction to Public Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.1.1 Negligible Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.1.2 One-Way Trapdoor Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.2 The RSA Public Key Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.3 Security of RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.3.1 Pollard p − 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.3.2 Pollard ρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3.3 Difference of Two Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.4 The ElGamal Public Key Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.5 Hybrid Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.6 Symmetric vs. Public Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10 Digital Signature Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.1 Introduction to Digital Signature Schemes. . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.2 The RSA Digital Signature Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
10.3 Signature with Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
10.4 Security of Digital Signature Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
10.5 Hash Functions and DSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.5.1 The Discrete Log Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
10.5.2 The MD-4 Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.5.3 Hash-Then-Sign DSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11 Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
11.1 Linearly Recursive Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
11.2 The Shrinking Generator Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
11.3 Linear Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
11.4 Pseudorandom Bit Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.4.1 Hard-Core Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
11.4.2 Hard-Core Predicates and the DLA. . . . . . . . . . . . . . . . . . . . . . . . 238
11.4.3 The Blum–Micali Bit Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.4.4 The Quadratic Residue Assumption . . . . . . . . . . . . . . . . . . . . . . . 245
11.4.5 The Blum–Blum–Shub Bit Generator . . . . . . . . . . . . . . . . . . . . . 247
11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
xii Contents

12 Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255


12.1 The Diffie–Hellman Key Exchange Protocol . . . . . . . . . . . . . . . . . . . . . . . 255
12.2 The Discrete Logarithm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
12.2.1 The General DLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.2.2 Index Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
12.2.3 Efficiency of Index Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
12.2.4 The Man-in-the-Middle Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
13 Elliptic Curves in Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
13.1 The Equation y 2 = x 3 + ax + b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
13.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
13.3 Singular Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
13.4 The Elliptic Curve Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
13.4.1 Structure of E(K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
13.5 The Elliptic Curve Key Exchange Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 287
13.5.1 Comparing ECKEP and DHKEP . . . . . . . . . . . . . . . . . . . . . . . . . . 289
13.5.2 What Elliptic Curves to Avoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
13.5.3 Examples of Good Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
14 Singular Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.1 The Group Ens (K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.2 The DLP in Ens (K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
14.3 The Group Gc (K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
14.4 Ens (K) ∼
= Gc (K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Chapter 1
Introduction to Cryptography

1.1 Introduction to Cryptography

Cryptography, from the Greek meaning “secret writing,” is the science of making
messages unintelligible to all except those for whom the messages are intended.
In this way, sensitive or private messages are kept secure and protected from
unauthorized or unwanted access by persons or entities other than the intended
recipients.
The message intended to be sent is the plaintext. An example of plaintext is

Eyes of the World.

The communications are kept secure by converting the plaintext into ciphertext
by a process called encryption. Encryption is achieved through the use of an
encryption transformation. An example of an encryption transformation is
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25}.

(Note: in the encryption transformation e, the alphabet wraps around, and thus, if
k = 2, then the letter y is replaced with a, z is replaced with b, and so on.)
The person sending the plaintext can only encrypt if she has chosen a value
for k, which is the “key” to encryption. Generally, an encryption key is the
additional information required so that plaintext can be converted to ciphertext
using an encryption transformation. Given an encryption transformation, the set
of all possible encryption keys is the encryption keyspace. For the encryption
transformation e given above, the keyspace is {0, 1, 2, . . . , 25}.
With e as above and encryption key k = 2, the encryption of the plaintext Eyes
of the World is the ciphertext

Gagu qh vjg Yqtnf;

© Springer Nature Switzerland AG 2022 1


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_1
2 1 Introduction to Cryptography

encrypon
Eyes of the World Gagu qh vjg Yqtnf

transmission

decrypon
Gagu qh vjg Yqtnf Eyes of the World

Fig. 1.1 Encryption–decryption process, k = 2

if k = 17, then the ciphertext is

Vpvj fw kyv Nficu.

The sender of the message then transmits the ciphertext to the recipient, who con-
verts it back into the original plaintext by a process called decryption. Decryption
is achieved using a decryption transformation. Decryption of ciphertext should
only be possible if the recipient is in possession of the key for decryption, called
the decryption key. For a given decryption transformation, the set of all possible
decryption keys is the decryption keyspace.
For the example above, the decryption transformation is
d = replace each letter of the ciphertext with the letter that is k places to the left in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25},

and the decryption keyspace is {0, 1, 2, . . . , 25}. With decryption key k = 2, the
decryption of Gagu qh vjg Yqtnf is

Eyes of the World.

The encryption–decryption process is illustrated in Figure 1.1.


In the example of Figure 1.1, both the sender and the recipient share the same
encryption–decryption key, namely, k = 2. This encryption–decryption process is
an example of a symmetric (or shared-key) cryptosystem.

1.2 The Players in the Game

We introduce some formal notation for a cryptosystem. We assume the person


sending plaintext message M is “Alice,” denoted by A, and the intended recipient
is “Bob,” denoted by B. Let e denote an encryption transformation and let ke be
an encryption key. Let d denote the corresponding decryption transformation with
1.2 The Players in the Game 3

decryption key kd . Let

C = e(M, ke )

denote the encryption of M into ciphertext C using transformation e and key ke .


Then

M = d(C, kd ) = d(e(M, ke ), kd ) (1.1)

denotes the decryption of C back to M. In the example of Section 1.1,

M = Eyes of the World,

e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet where k ∈ {0, 1, 2, . . . , 25},

ke = 2,

and

C = Gagu qh vjg Yqtnf.

Thus in formal notation, the encryption–decryption process in Figure 1.1 can be


written as

Gagu qh vjg Yqtnf = e(Eyes of the World, 2),

Eyes of the World = d(Gagu qh vjg Yqtnf, 2).

Definition 1.2.1 A cryptosystem is a system

M, C, e, d, Ke , Kd ,

where M is a collection of plaintext messages, C is the corresponding set of


ciphertext, e is an encryption transformation with keyspace Ke , and d is the
decryption transformation with keyspace Kd .
Security of a cryptosystem relies on the fact that the decryption key kd ∈ Kd is
known only to Bob (and possibly Alice or a trusted third party). As we have seen,
a symmetric cryptosystem is one in which both Alice and Bob have a shared secret
key for encryption and decryption: ke = kd . A cryptosystem in which ke = kd is an
asymmetric cryptosystem.
In an asymmetric cryptosystem, Alice does not know Bob’s secret key for
decryption, kd ; only Bob knows his key for decryption. The key for encrypting
messages to Bob, ke , however, is made public (it is not secret). For this reason,
4 1 Introduction to Cryptography

an asymmetric cryptosystem is also called a public key cryptosystem. In public


key cryptography, we write the encryption–decryption relationship as

C = e(M), (1.2)

M = d(C, kd ) = d(e(M), kd ). (1.3)

In Chapter 8, we will review the major symmetric key cryptosystems, and in


Chapter 9, we will consider public key cryptosystems.
Suppose Alice intends to send messages to Bob using the cryptosystem
M, C, e, d, Ke , Kd . There is another player in this game, “Malice” (sometimes
also called “ Eve”), whose goal is to intercept ciphertext, obtain knowledge of
the transformations e and d and the keys kd and ke (if not public), and use this
information to decipher ciphertext by some method or process.
Consequently, Malice eavesdrops on as many conversations between Alice and
Bob as possible. Malice will also pose as Alice and send messages to Bob, with
Bob thinking they are coming from Alice, or pose as Bob to receive messages from
Alice that Alice thinks are going to Bob. The result: Total breakdown of secure
communications between Alice and Bob!
Malice’s activities are known as attacks on the cryptosystem. If Malice only has
some ciphertext C, then the attack is known as a ciphertext only attack. If Malice
has obtained a pairing M, C = e(M, ke ) for some M ∈ M, then the attack is a
known plaintext attack. If Malice has a pairing M, C = e(M, ke ), where M is
selected by Malice, then the attack is a chosen plaintext attack.
Generally, Malice is engaging in cryptanalysis of the cryptosystem. The science
and art of obtaining plaintext from its corresponding ciphertext without a priori
knowledge of e, d, ke , or kd is cryptanalysis. Cryptography and cryptanalysis
together are known as cryptology.

1.3 Ciphertext Only Attack: An Example

Suppose Alice and Bob have agreed to use the symmetric cryptosystem
M, C, e, d, K where
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet where k ∈ {0, 1, 2, . . . , 25}.

(Here, we assume that K = Ke = Kd since this is a symmetric cryptosystem.) They


have communicated beforehand and have agreed to use the shared key k = 18 which
they have kept as a secret—only Alice and Bob know the value of k.
Suppose Alice encrypts a plaintext message, and in the transmission of the
ciphertext to Bob, Malice obtains some of the ciphertext. Malice then engages in
a ciphertext only attack to obtain the key by decrypting the ciphertext with every
1.3 Ciphertext Only Attack: An Example 5

possible key. Only one key, the correct key k, should result in a decryption that is
legible, legitimate plaintext. This method of attack is called key trial. Key trial is an
example of a brute-force method of cryptanalysis since every possible key is tested.
Example 1.3.1 Alice encrypts the message

M = Move supplies north

as

C = e(M, 18) = Egnw kmhhdawk fgjlz.

Malice is able to intercept the ciphertext Egnw and uses key trial to obtain the key
and some plaintext. He guesses that the ciphertext has been encrypted using e, and
so the decryption transformation is
d = replace each letter of the ciphertext with the letter that is k places to the left in the
alphabet where k ∈ {0, 1, 2, . . . , 25}.

Malice computes d(Egnw, k) for all possible keys k = 0, 1, 2, . . . , 25:

k d(Egnw, k) k d(Egnw, k)
0 Egnw 13 Rtaj
1 Dfmv 14 Qszi
2 Celu 15 Pryh
3 Bdkt 16 Oqxg
4 Acjs 17 Npwf
5 Zbir 18 Move
6 Yahq 19 Lnud
7 Xzgp 20 Kmtc
8 Wyfo 21 Jlsb
9 Vxen 22 Ikra
10 Uwdm 23 Hjqz
11 Tvcl 24 Gipy
12 Subk 25 Fhox

Malice observes that Move is the only legible English word in the list and so deduces
that k = 18.

The success of the cryptanalysis technique of key trial depends on two factors:
1. The existence of a relatively small number of keys, making exhaustive testing of
each key feasible
2. The unlikelihood that two different keys produce recognizable plaintext after
decryption
6 1 Introduction to Cryptography

Example 1.3.2 In this example, Alice encrypts the plaintext

M = IBM COMPUTERS

as

C = e(M, 4) = MFQ GSQTYXIVW.

In this case, Malice is able to intercept the ciphertext MFQ and again uses key trial
to find the key:

k d(MFQ, k) k d(MFQ, k)
0 MFQ 13 ZSD
1 LEP 14 YRC
2 KDO 15 XQB
3 JCN 16 WPA
4 IBM 17 VOZ
5 HAL 18 UNY
6 GZK 19 TMX
7 FYJ 20 SLW
8 EXI 21 RKV
9 DWH 22 QJU
10 CVG 23 PIT
11 BUF 24 OHS
12 ATE 25 NGR

There are at least four legible words produced: IBM, HAL, ATE, and PIT. So Malice
can only conclude that the key is either 4, 5, 12, or 23.


In Example 1.3.2, none of the keys 5, 12, and 23 are the correct key, yet they
produce legible plaintext. These keys are “spurious” keys.
More formally, let M, C, e, d, K be a symmetric cryptosystem with shared
secret key k. Let C be the encryption of plaintext message M using k, i.e., C =
e(M, k). A key l ∈ K, other than k, for which the decryption of C with l results
in legible, legitimate, plaintext is a spurious key for C. Said differently, a spurious
key is a key l = k for which d(C, l) ∈ M.
In Example 1.3.2, the spurious keys for MFQ are 5, 12, 23; in Example 1.3.1,
there are no spurious keys for Engw.
As the number of characters in the ciphertext increases to ∞, the number of
spurious key decreases to 0. For a sufficiently long string of ciphertext, there are no
spurious keys; only the correct key k results in a legitimate decryption.
For a given (symmetric) cryptosystem M, C, e, d, K, the minimum length of
ciphertext that guarantees no spurious keys is difficult to compute. We can, however,
1.4 Exercises 7

compute a lower bound for this minimum value; this lower bound is the unicity
distance of the cryptosystem.
More precisely, the unicity distance is a lower bound for the size n0 of encrypted
words so that there is a unique key k ∈ K that maps C (consisting of ciphertext of
length n0 ) into M.
The unicity distance is a theoretical measure of the ability of the cryptosystem to
withstand a ciphertext only attack.
For the type of “right shift” transformations we have seen in our examples, it has
been computed that the unicity distance is

log2 (26)/3.2 = 4.7/3.2 = 1.47

characters of ciphertext, see Section 3.4.


In Example 1.3.1, the length of the ciphertext is 4, which is greater than the
unicity distance 1.47. We had no spurious keys.
In Example 1.3.2, the length of the ciphertext is 3, which is greater than 1.47, yet
there were 3 spurious keys (the correct key is not uniquely determined). This finding
is not a contradiction: as we recall, the unicity distance is only a lower bound for
the minimal length of ciphertext needed to guarantee no spurious keys.
We will discuss spurious keys and unicity distance in detail in Section 3.4.

1.4 Exercises

1. Let e be the encryption transformation defined as


e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25}.

Suppose that ke = 5. Compute C = e(Montgomery, 5).


2. Let d be the decryption transformation defined as
d = replace each letter of the ciphertext with the letter that is k places to the left in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25}.

Suppose that kd = 12. Compute M = d(mxsqndm, 12).


3. Suppose that Alice sends Bob the ciphertext C = ZLUKOLSW using the
encryption transformation e of Exercise 1. During transmission, Malice obtains
the partial ciphertext ZLUK. Find the value of ke and kd . What is the plaintext
message? What type of attack is Malice engaged in?
4. Suppose that Malice knows that Alice and Bob are using e and d as in
Section 1.1 to communicate securely. Suppose that Malice has obtained the
plaintext,ciphertext pairing secret, bnlanc. Explain how Malice can obtain
the encryption key ke . What is the value of ke ?
5. Consider the following variation of the cryptosystem given in Section 1.1. In
this case, the alphabet is the three-element set of letters {a, b, c}, and plaintext
8 1 Introduction to Cryptography

messages are finite sequences of a’s, b’s, and c’s. Thus the message space M =
{a, b, c}∗ . Let e be the encryption transformation defined as
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2}.

Using this cryptosystem, compute the following:


(a) C = e(accbbac, 2)
(b) M = d(abcacbc, 1)
6. Suppose that Alice is using the “right shift” encryption transformation given in
Section 1.1 to encrypt plaintext messages. Alice encrypts a two-letter plaintext
word as tw. Compute the number of spurious keys for tw.
7. Suppose that Alice is using the “right shift” encryption transformation given in
Section 1.1 to compute the ciphertext C = e(M, k), where M ∈ M, k ∈ K,
k = 0. If C ∈ M, prove that there is at least one spurious key.
Chapter 2
Introduction to Probability

In this chapter we review some basic notions of probability that we will need in the
chapters that follow.

2.1 Introduction to Probability

Probability concerns the outcomes of experiments. A nondeterministic experi-


ment is an experiment in which the outcome cannot be predicted. For example,
the following are nondeterministic experiments:
E1 = a fair die is cast and the number of spots shown on the uppermost face is
recorded.
E2 = a card is selected at random from a standard deck of playing cards and the
suit is recorded.
E3 = a coin is flipped three times and the occurrences of “heads” and “tails” are
recorded.
The reader should make the distinction between nondeterministic and deterministic
experiments. For instance, given the ordinary calculus function f (x) = 2x + 3,
the experiment “the input x = −1 is selected and the value of the function f (x)
is recorded” is deterministic in that we can easily predict its outcome, in this case
f (−1) = 1.
Let E be a nondeterministic experiment. A sample space  associated to E
is a set containing all possible mutually exclusive outcomes of E. For example, a
sample space for experiment E1 is 1 = {1, 2, 3, 4, 5, 6}, a sample space for E2 is
2 = {♣, ♦, ♥, ♠}, and a sample space for E3 is

3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.

© Springer Nature Switzerland AG 2022 9


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_2
10 2 Introduction to Probability

A sample space for an experiment is not unique: indeed

 = {exactly zero heads, exactly one head, exactly two heads,

exactly three heads}

is also a sample space for experiment E3 .


Let E be a nondeterministic experiment with sample space . An event A is a
subset of .
For experiment E3 and sample space 3 , A = {T H H, H T H, H H T } ⊆ 3 is
the event “a coin is tossed three times and exactly two heads occur.” For experiment
E2 and sample space 2 , A = {♣, ♥} ⊆ 2 is the event “a card is selected at
random and it is a club or a heart.”
Event A occurs if the result of E is an element of A. For example, if the result of
experiment E2 is the card 10♥, then A has occurred. If the card drawn is K♦, then
A has not occurred.
Definition 2.1.1 Let E be a nondeterministic experiment with sample space , and
let A ⊆  be an event. Suppose that E is repeated N times. The relative frequency
of event A is
n(A)
fN (A) =
N

where n(A) is the number of times that event A occurs in the N repetitions.
For example, consider experiment
E1 = a fair die is cast and the number of spots shown on the uppermost face is
recorded.
with sample space 1 = {1, 2, 3, 4, 5, 6}. Let A = {4} be the event “a fair die is cast
and the result is 4.” Let B = {5, 6} be the event “a fair die is cast and the result is at
least 5.” Suppose that experiment E1 is repeated 20 times with following results.

Trial Outcome Trial Outcome


1 5 11 2
2 6 12 5
3 2 13 5
4 3 14 2
5 4 15 4
6 1 16 5
7 4 17 1
8 6 18 4
9 6 19 6
10 2 20 2
2.1 Introduction to Probability 11

Then f20 (A) = 4/20 = 1/5 and f20 (B) = 8/20 = 2/5.
As we shall see in Section 2.6, the relative frequency of an event is a good
approximation of the probability that an event occurs.

2.1.1 Abstract Probability Spaces

Let  be a non-empty set, and let A, B be subsets of . The union of A, B is the


set

A ∪ B = {x ∈  : x ∈ A or x ∈ B};

the intersection is the set

A ∩ B = {x ∈  : x ∈ A, x ∈ B}.

Let A be a (non-empty) collection of subsets of . Then A is an algebra if the


following conditions hold:
(i) A ∪ B is in A whenever A, B ∈ A.
(ii) A is in A whenever A ∈ A. (Here, A = A).

An algebra A is a σ -algebra if the union ∞ ∞
i=1 Ai is in A whenever {Ai }i=1 is a
countable collection of subsets of A.
Proposition 2.1.2 Let A be a σ -algebra of subsets of . Then  ∈ A and ∅ ∈ A.
Proof Let A ∈ A. Since A is an algebra, A = \A ∈ A. Thus,  = A ∪ A ∈ A.
Now, since  ∈ A,  = ∅ ∈ A. 

Definition 2.1.3 An abstract probability space is a triple (, A, Pr), where  is a
non-empty set, A is a σ -algebra of subsets of , and Pr is a function Pr : A → [0, 1]
that satisfies:
(i) Pr() = 1.
(ii) Pr(A ∪ B) = Pr(A) + Pr(B) whenever A ∩ B = ∅.
If (, A, Pr) is an abstract probability space, then  is the sample space, A is
the set of events, and Pr is the probability function. For event A ∈ A, Pr(A) is
the probability that event A occurs. From Definition 2.1.3(i), we have Pr() = 1.
Thus from Definition 2.1.3(ii) we conclude that Pr(∅) = 0.
Let (, A, Pr) be an abstract probability space. For A, B ∈ A,

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). (2.1)

Suppose that A1 , A2 , . . . , An is a finite collection of n events with Ai ∩ Aj =


∅, ∀i, j , and ni=1 Ai =  (that is, suppose that A1 , A2 , . . . , An is a partition of
12 2 Introduction to Probability

). Then


n
Pr(Ai ) = 1. (2.2)
i=1

Note that (2.1) and (2.2) can be proved using Definition 2.1.3.
There is a very special abstract probability space that we will use in cryptography.
Definition 2.1.4 (Classical Definition of Probability) Let  to be a finite set of n
elements:

 = {a1 , a2 , . . . , an },

and let A be the power set of , that is, A is the collection of all subsets of . We
define a set function Pr : A → [0, 1] by the rule

|A|
Pr(A) = ,
n

where |A| is the number of elements in A ∈ A. For an event A ⊆ , Pr(A) is the


probability that event A occurs. The triple (, A, Pr) is an abstract probability space
called the classical probability space; the probability function Pr is the classical
probability function.
The classical probability space was introduced by Laplace (1749-1827).
Remark 2.1.5 In Definition 2.1.4, the probability of each singleton point set event
{ai }, 1 ≤ i ≤ n, is 1/n; each “elementary” event is equally likely. The probability
function yields the uniform distribution function (see Example 2.4.1).
The classical definition of probability is used to compute the probabilities
associated to nondeterministic experiments.
Example 2.1.6 Consider again the experiment and sample space:
E1 = a fair die is cast and the number of spots shown on the uppermost face
is recorded; 1 = {1, 2, 3, 4, 5, 6}. Let (1 , A1 , Pr) be the classical probability
space.
Let A = {4}. Then Pr(A) = 1/6. Let B = {5, 6}. Then Pr(B) = 1/3. Also,

Pr(A ∪ B) = Pr(A) + Pr(B) = 1/6 + 1/3 = 1/2.


Example 2.1.7 In this example, we take
E2 = a card is selected at random from a standard deck of playing cards and the
suit is recorded; 2 = {♣, ♦, ♥, ♠}
2.1 Introduction to Probability 13

Let A = the selected card is not a ♠, that is, A = {♣, ♦, ♥}. Then Pr(A) = 3/4
and Pr(A) = 1/4. 

Example 2.1.8 We next consider:


E3 = A fair coin is flipped three times and the occurrences of “heads” (H ) and
“tails” (T ) are recorded;

3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.

Let A be the event “the last flip is H ,” and let B be the event “at least two H have
occurred.” Then Pr(A) = 1/2 and Pr(A ∪ B) = 5/8. 

Sometimes we have to be very careful how we use Definition 2.1.4 to compute


probabilities. For example, consider the experiment
E = Two fair die are cast and the sum of the spots shown on the uppermost faces
are recorded, with sample space

 = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

In this case, a naive use of Definition 2.1.4 will lead us astray: for instance,
Pr({5}) = 11
1
.
The issue is that the elementary events (singleton subsets) are not equally likely;
the probability distribution is not uniform. This can be easily seen by repeating E
a large number of times and computing the relative frequencies of the singleton
subsets of .
If we think of E as two independent events (one die comes to a stop an instant
before the other die) and use the sample space

 = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)},
14 2 Introduction to Probability

then we can use Definition 2.1.4 to compute Pr({(i, j )}) = 1/36 for 1 ≤ i, j ≤ 6.
In fact, then {5} = {(1, 4), (2, 3), (3, 2), (4, 1)} and so Pr({5}) = 4/36 = 1/9.

2.2 Conditional Probability

Let (, A, Pr) be a probability space, and let A, B ∈ A. Assume Pr(A) = 0. The
probability that B occurs given that A has occurred is the conditional probability
of B given A, denoted by Pr(B|A). One has

Pr(A ∩ B)
Pr(B|A) = .
Pr(A)

For example, consider again the experiment and sample space:


E3 = A fair coin is flipped three times and the occurrences of “heads” and “tails”
are recorded;

3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.

Let A = the last flip is H , and let B = at least two H have occurred. Then one
computes

Pr(A ∩ B)
Pr(B|A) =
Pr(A)
3/8
=
4/8
= 3/4.

Intuitively, if Pr(B|A) = Pr(B), then the occurrence of B does not depend on the
occurrence of A. Thus B is independent of A. Moreover,

Pr(A ∩ B) = Pr(B|A) Pr(A) = Pr(B) Pr(A).

This motivates the following definition. The events A, B are independent if

Pr(A ∩ B) = Pr(A) Pr(B).

Observe that in experiment E3 with sample space 3 , the events A, B are not
independent since

Pr(B|A) = 3/4 = Pr(B) = 1/2.

Here is an important proposition involving conditional probabilities.


2.3 Collision Theorems 15

Proposition 2.2.1 (Total Probability Theorem) Suppose that


A1 , A2 , . . . , An is a partition of , and let A be any event. Then


n
Pr(A) = Pr(A|Ai ) Pr(Ai ).
i=1

Proof We have
 

n 
n
A=∩A= Ai ∩A= (Ai ∩ A).
i=1 i=1

Thus
 

n
Pr(A) = Pr (Ai ∩ A)
i=1


n
= Pr(Ai ∩ A)
i=1


n
= Pr(A|Ai ) Pr(Ai ).
i=1




2.3 Collision Theorems

Collision theorems concern probabilities about whether function values collide or


are the same, and are used in various attacks on cryptosystems.
We present two different collision models.
Suppose S is a finite non-empty set of elements with N = |S|. We think of
S as a sample space  of N elements. We assume that (, A, Pr) is the classical
probability space, where Pr is the classical definition of probability. To simplify
notation, we write y for the singleton subset {y} ⊆ S. We have Pr(y) = 1/N for all
y ∈ S.
Let m ≥ 2 be an integer, and let

f : {1, 2, 3, . . . , m} → S

be a function with f (i) = yi ∈ S for 1 ≤ i ≤ m. We consider the function f as a


sequence of m terms (function values)

y1 , y2 , y3 , . . . , ym
16 2 Introduction to Probability

randomly chosen from S (with replacement); y1 is the first term of the sequence and
represents the first choice, y2 is the second term of the sequence representing the
second choice, and so on. As with all sequences, we could have yi = yj for some
i = j .
What is the probability that all of the terms are distinct? (Equivalently, what is
the probability that f is an injection?)
Let (yi = yj ) denote the event “yi is equal to yj ,” 1 ≤ i, j ≤ m. We have
Pr(y1 = y2 ) = N1 , thus Pr(y1 = y2 ) = 1 − N1 . So the probability that the first two
terms of the sequence are distinct is 1 − N1 .
We next consider the first three terms y1 , y2 , y3 . We have the conditional
probability

1 1 2
Pr(((y1 = y3 ) ∪ (y2 = y3 ))|(y1 = y2 )) = + = ,
N N N
since

(y1 = y3 ) ∩ (y2 = y3 ) = ∅,

whenever y1 = y2 . Thus

2
Pr(((y1 = y3 ) ∩ (y2 = y3 ))|(y1 = y2 )) = 1 − .
N
Now,

  
2 1
Pr((y1 = y3 ) ∩ (y2 = y3 ) ∩ (y1 = y2 )) = 1 − 1− ,
N N

and so the probability that the first three terms are distinct is
  
1 2
Pr(y1 , y2 , y3 distinct) = 1 − 1− .
N N

Continuing in this manner, we find that the probability that the m terms are distinct
is

    
1 2 m−1
Pr(y1 , y2 , . . . ym distinct) = 1 − 1− ··· 1 −
N N N
m−1  
i
= 1− .
N
i=1

We have proved our first “collision” theorem.


2.3 Collision Theorems 17

Proposition 2.3.1 Let S be a finite non-empty set with N = |S|. Let y1 , y2 , . . . ym


be a sequence of m ≥ 2 terms chosen at random from S (with replacement). Then
the probability that there is a “collision,” that is, the probability that at least two of
the terms are the same, yi = yj for some i = j , is

m−1  
i
Pr(yi = yj for some i = j ) = 1 − 1− .
N
i=1

We can obtain a lower bound for the probability of a collision. From the
i
inequality ex ≥ 1+x, valid for all x ∈ R, we obtain e− N ≥ 1− Ni for 1 ≤ i ≤ m−1.
And so
m−1   m−1
i i
1− ≤ e− N
N
i=1 i=1
1 m−1
= e− N i=1 i

−m(m−1)
=e 2N .

Thus
−m(m−1)
Pr(yi = yj for some i = j ) ≥ 1 − e 2N . (2.3)

Example 2.3.2 (Birthday Paradox) Let S be the set of days in a non-leap year, with
N = |S| = 365. Let m ≥ 2 and randomly select a group of m ≥ 2 people, denoted
as P1 , P2 , . . . , Pm . Let

B : {P1 , P2 , . . . , Pm } → S

be the “birthday function,” where yi = B(Pi ) is the birthday of person Pi . The


birthday function corresponds to a sequence of birthdays y1 , y2 , . . . , ym .
We compute the minimum value for m so that it is likely that at least two people
in the group of m people have the same birthday. In other words, we find the smallest
value of m so that
1
Pr(yi = yj for some i = j ) > .
2
For x ∈ R, let x denote the smallest integer ≥ x.
18 2 Introduction to Probability

−m(m−1)
From (2.3), we seek the smallest m so that 1 − e 730 > 12 , or

1 −m(m−1)
> e 730 ,
2
or

730 ln(0.5) > −m(m − 1),

or

m(m − 1) ≥ −730 ln(0.5) = 506,

thus m = 23.
So our conclusion is the following: if we choose at least 23 people at random,
then it is likely that two of them will have the same birthday. The fact that we obtain
a “collision” of birthdays by choosing as few as 23 people is somewhat surprising,
hence this phenomenon is known as the birthday paradox.


Example 2.3.3 (Square
√ √
Root Attack) If m = 1 +  1.4N  in Proposition 2.3.1,
−m(m−1)
m(m−1)
then 2N > 1.4N 1.4N
2N = 0.7, thus −m(m−1)
2N < −0.7, hence e 2N < e−0.7 ,
or
−m(m−1)
1−e 2N > 1 − e−0.7 > 50%.
√ √
Thus choosing a sequence of at least 1 +  1.4N  ≈ 1.4N terms in S ensures
that a collision is likely to
√ occur (i.e.,√a collision occurs with probability > 50%).
Choosing m ≥ 1 +  1.4N  ≈ 1.4N to ensure a likely collision is therefore
called the square root attack.


The square root attack is used in the Pollard ρ algorithm (Algorithm 9.3.6)
to attack the RSA public key cryptosystem. It is also used in our discussion of
cryptographic hash functions (Section 10.5).
Here is the second collision model. Again, S is a finite non-empty set of elements
N = |S| and we assume that (, A, Pr) is the classical probability space, where
 = S and Pr is the classical probability function.
Let n be an integer, 1 ≤ n ≤ N, and let T = {x1 , x2 , . . . , xn } be a subset of S.
Let m ≥ 1, and let

y1 , y2 , . . . , ym
2.4 Random Variables 19

be a sequence of m random terms of S. The probability that y1 does not match any
of the elements of T is 1 − Nn . The probability that y2 does not match any of the
elements of T is 1 − Nn , and the probability that neither y1 nor y2 matches any
element of T is (1 − Nn )2 . The probability that none of the random terms match any
element of T is (1 − Nn )m . Thus
n m
Pr(yi = xj for some i, j ) = 1 − 1 − .
N
So we have the second collision theorem.
Proposition 2.3.4 Let S be a finite non-empty set, N = |S|. Let n be an integer,
1 ≤ n ≤ N , and let T = {x1 , x2 , . . . , xn } ⊆ S. Let y1 , y2 , . . . ym be a sequence of
m ≥ 1 terms chosen at random from S (with replacement). Then the probability that
there is a “collision,” that is, there is at least one term of the sequence that matches
some element of T is

n m
Pr(yi = xj for some i, j ) = 1 − 1 − .
N

Proposition 2.3.4 is used in the Baby Step/Giant Step attack on the Diffie–
Hellman Key Exchange protocol (see Algorithm 12.2.11).

2.4 Random Variables

Random variables will be used to compute the entropy of plaintext English


(Section 3.2) and in the computation of the unicity distance of a cryptosystem
(Section 3.4).
Let (, A, Pr) be a probability space. Let S = {s1 , s2 , . . . , sm } be a finite set of
elements. A discrete random variable is a function X :  → S. Let si ∈ S. By the
notation (X = si ), we mean the event

(X = si ) = {ω ∈  : X(ω) = si }.
m
Put pi = Pr(X = si ). Then i=1 pi = 1. The function

fX : S → [0, 1],

defined as

fX (si ) = Pr(X = si ) = pi

is the distribution function for the random variable X. Here are two important
examples.
20 2 Introduction to Probability

Example 2.4.1 (Uniform Distribution Function) Let


 = {a1 , a2 , . . . , an }, and let A be the power set of . Define Pr(A) = |A|/n for
A ∈ A. (Note: this is the classical definition of probability.) Let S = {1, 2, . . . , n},
and let X :  → S be defined as

X(ai ) = i,

for 1 ≤ i ≤ n. Then the event

(X = i) = {a ∈  : X(a) = i} = {ai }.

Thus
1
Pr(X = i) = Pr({ai }) = , for i = 1, . . . , n.
n
The function fX : S → [0, 1] defined as

1
fX (i) = , ∀i
n

is the uniform distribution function.




For instance, take  = {a1 , a2 , a3 , a4 , a5 }, S = {1, 2, 3, 4, 5} with random


variable X :  → S defined as X(ai ) = i. Then the uniform distribution function
fX : S → [0, 1] appears as in Figure 2.1.
Let  be the sample space defined as  = {success, failure}, and let A be the
power set of . Define Pr : A → [0, 1] by Pr(∅) = 0, Pr({success}) = p, 0 ≤ p ≤
1, Pr({failure}) = q = 1 − p, Pr({success, failure}) = 1. (Note that {success} ∪
{failure} = {success,failure}.) Then (, A, Pr) is a probability space.
Let S = {0, 1}, and define a random variable ξ :  → S by the rule ξ(success) =
1, ξ(failure) = 0. Then the distribution function fξ : S → [0, 1] is defined as
fξ (1) = Pr(ξ = 1) = Pr({ω ∈  : ξ(ω) = 1}) = Pr({success}) = p, fξ (0) =
Pr(ξ = 0) = Pr({ω ∈  : ξ(ω) = 0}) = Pr({failure}) = 1 − p = q.
We repeat an experiment with sample space

 = {success, failure}

n times. The possible outcomes are

n = {success, failure}n ,
2.4 Random Variables 21

0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5

Fig. 2.1 Uniform distribution function fX : S → [0, 1]

which consists of all possible sequences of “success” and “failure” of length n. For
example, for n = 5,

ω = success, failure, success, failure, failure

is such a sequence.
Example 2.4.2 (Binomial Distribution Function) With the notation as above, let
S = {0, 1, 2, 3, . . . , n} and define a random variable

ξn : n → S

by the rule

ξn (ω) = the number of occurrences of “success” in sequence


ω ∈ n
= the number of favorable outcomes in conducting the
experiment n times

ξn is the binomial random variable. For 0 ≤ k ≤ n, 0 < p < 1, the event

(ξn = k) = {ω ∈ n : there are k occurrences of “success” in ω}

has probability
   
n k n−k n k
Pr(ξn = k) = p q = p (1 − p)n−k .
k k
22 2 Introduction to Probability

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6

Fig. 2.2 Binomial distribution function fξ6 : S → [0, 1]

The corresponding distribution function

fξn : S → [0, 1]

defined as

 
n k
fξn (k) = Pr(ξn = k) = p (1 − p)n−k ,
k

for 0 ≤ k ≤ n, is the binomial distribution function.




For example, if n = 6, p = 1/3, S = {0, 1,2, 3, 4, 5, 6}, we obtain the binomial


distribution fξ6 : S → [0, 1], with fξ6 (k) = k6 (1/3)k (2/3)6−k . (See Figure 2.2.)

2.5 2-Dimensional Random Variables

Let (, A, Pr) be an abstract probability space. A 2-dimensional random variable


is a pairing (X, Y ) of random variables X :  → S, Y :  → T , S =
{s1 , s2 , . . . , sm }, T = {t1 , t2 , . . . , tn }. The notation (X = si ∩ Y = tj ) denotes
the event

{ω ∈  : X(ω) = si and Y (ω) = tj },


2.5 2-Dimensional Random Variables 23

or more simply, both (X = si ) and (Y = tj ) occur. The joint probability is Pr(X =


si ∩ Y = tj ). The conditional probability is Pr(X = si |Y = tj ). The function

fX,Y (si , tj ) = Pr(X = si ∩ Y = tj )

is the joint distribution function for the 2-dimensional random variable (X, Y ).
Note that m i=1
n
j =1 fX,Y (si , tj ) = 1.
Example 2.5.1 Let (, A, Pr) be an abstract probability space. Let E be the non-
deterministic experiment “a pair of fair dice are cast and number of spots on the
uppermost face of the first die and the uppermost face on the second die is recorded.”
Let X :  → {1, 2, 3, 4, 5, 6}, Y :  → {1, 2, 3, 4, 5, 6} be random variables,
where the event (X = i) is “the uppermost face of the first die has i spots,” and the
event (Y = j ) is “the uppermost face on the second die has j spots.” Then the joint
probability is Pr(X = i ∩ Y = j ) = 36 1
for 1 ≤ i, j ≤ 6. 

Example 2.5.2 Let (, A, Pr) be an abstract probability space. Let E be the non-
deterministic experiment “a single die is cast and number of spots on the uppermost
face is recorded.” Let X :  → {1, 2, 3, 4, 5, 6}, Y :  → {odd,even} be random
variables, where the event (X = i) is “the uppermost face of the die has i spots,”
and the event (Y = odd) is “i is odd,” and the event (Y = even) is “i is even.” Then
the joint probability satisfies
1
Pr(X = i ∩ Y = odd) = 6 if i is odd
0 if i is even

0 if i is odd
Pr(X = i ∩ Y = even) = 1
6 if i is even

Let (X, Y ) be a 2-dimensional random variable. Then by the events (X = si ),


(Y = tj ) we mean


n
(X = si ) = (X = si ∩ Y = tj ),
j =1


m
(Y = tj ) = (Y = tj ∩ X = si ).
i=1
24 2 Introduction to Probability

Since

(X = si ∩ Y = tj ) ∩ (X = si ∩ Y = tk ) = ∅,

whenever j = k, we have


n

Pr(X = si ) = Pr (X = si ∩ Y = tj )
j =1


n
= Pr(X = si ∩ Y = tj )
j =1


n
= fX,Y (si , tj ),
j =1

Likewise,


m
Pr(Y = tj ) = fX,Y (si , tj ).
i=1

The marginal distribution functions are defined as


n 
n
f1 (si ) = Pr(X = si ) = fX,Y (si , tj ) = Pr(X = si ∩ Y = tj ),
j =1 j =1


m 
m
f2 (tj ) = Pr(Y = tj ) = fX,Y (si , tj ) = Pr(Y = tj ∩ X = si ).
i=1 i=1

The product of marginals distribution function is defined as

p(si , tj ) = f1 (si )f2 (tj ).

As one can check,


m 
n
f1 (si )f2 (tj ) = 1.
i=1 j =1

The random variables X, Y are independent if

fX,Y (si , tj ) = f1 (si )f2 (tj ),

for all si , tj , otherwise, X and Y are dependent.


2.6 Bernoulli’s Theorem 25

The random variables in Example 2.5.1 are independent, while the random
variables of Example 2.5.2 are dependent.

2.6 Bernoulli’s Theorem

We close this chapter with Bernoulli’s theorem, also known as the Law of Large
Numbers. Bernoulli’s theorem reconciles the relative frequency of an event and the
classical definition of the probability of that event occurring.
Let (, A, Pr) denote the probability space defined in Example 2.4.2. Let n ≥ 1,
and let

ξn : n → S = {0, 1, 2, . . . , n}

denote the binomial random variable. For an integer k, 0 ≤ k ≤ n, we have


 
n k
Pr(ξn = k) = p (1 − p)n−k ,
k

where p is the probability that the event {success} occurs, 0 < p < 1. Note that the
relative frequency of the event {success} given n trials is precisely

ξn
fn ({success}) = .
n

At the same time, from the probability space (, A, Pr), we have

Pr({success}) = p.

We relate these two definitions of the probability of the event {success}. We claim
that ξnn approximates p in the sense that for any  > 0, and sufficiently large n, the
event
      
 ξn   
 − p <  = ω ∈ n :  ξn (ω) − p < 
n   n 

is very nearly certain to occur. That is,


  
 ξn 
 
Pr  − p <  ≈ 1,
n

for n very large. More precisely, we have the following theorem.


26 2 Introduction to Probability

Proposition 2.6.1 (Bernoulli’s Theorem) Let  > 0. Then


  
 ξn 
lim Pr  − p <  = 1.
n→∞ n

Proof For a proof, see [37, Chapter 3, Section 3.5.3]. 



Here is an application of Bernoulli’s Theorem. We take the experiment and
sample space:
E1 = a fair die is cast and the number of spots shown on the uppermost face is
recorded; 1 = {1, 2, 3, 4, 5, 6}.
Let A = {4}. Define “success” to be “E1 is conducted and event A occurs.”
Then Pr({success}) = 1/6. Let ξn be the binomial random variable defined as:
ξn is the number of occurrences of {success} in conducting experiment E1 n times.
Bernoulli’s Theorem then says that for each  > 0,
  
 ξn 1
lim Pr  −  <  = 1.
n→∞ n 6

In other words,

ξn 1

n 6
for very large n.

2.7 Exercises

1. Let E be the nondeterministic experiment:


E = A pair of fair dice are rolled and the sum of the number of spots shown
on the uppermost faces is recorded.
Let  = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} denote the sample space of E.
Suppose experiment E is repeated 20 times with the following results:
Define events as follows: A = the sum is at least 8, B = {6, 7, 11}, C =
{10}.
(a) Compute the relative frequency that A occurs.
(b) Compute the relative frequency that A or B occurs.
(c) Compute the relative frequency that C does not occur.
2.7 Exercises 27

Trial Outcome Trial Outcome


1 3 11 4
2 6 12 7
3 6 13 7
4 8 14 11
5 7 15 2
6 12 16 5
7 12 17 6
8 9 18 5
9 5 19 7
10 8 20 9

2. Let E be the nondeterministic experiment:


E = A coin is flipped four times and the occurrences “heads” (H ) and “tails”
(T ) are recorded.
Let  =
{H H H H, H H H T , H H T H, H H T T , H T H H, H T H T , H T T H,
H T T T , T H H H, T H H T , T H T H, T H T T , T T H H,
T T H T , T T T H, T T T T }
denote the sample space of E.
Define events as follows: A = the first flip is H , B = at least two T have
occurred, C = {H H H H, H H T T , T T T T }.
Compute the following probabilities using the classical definition of proba-
bility:
(a) Pr(A)
(b) Pr(A ∪ B)
(c) Pr(B ∩ C)
(d) Pr(C|B)
3. Referring to Exercise 2: Are the events A and B independent? Are the events
B and C independent?
4. Let (, A, Pr) be an abstract probability space. Prove formulas (2.1) and (2.2).
5. Let (, A, Pr) be an abstract probability space, and let A, B be events. Prove
that Pr(B|A) + Pr(B|A) = 1.
6. Consider the plaintext message M =
the storyteller makes no choice soon you will not
hear his voice his job is to shed light and not to
master
Compute the relative frequency that the letter a occurs in M. Compute the
relative frequency that the letter e or s occurs.
7. In Example 1.3.1 and Example 1.3.2, Malice uses a method of cryptanalysis
called key trial to determine the key and some plaintext. Let A be the event:
“given ciphertext C, key trial determines the correct key.” Show that Pr(A) =
1
|W (C)| , where
28 2 Introduction to Probability

W (C) = {l ∈ K : d(C, l) ∈ M}

(cf. Section 3.4).


8. Let S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} be a set of integers. Suppose that a
sequence of 5 elements of S are chosen at random (with replacement). What
is the probability that at least 2 of the terms of the sequence are identical?
9. Prove the formula: ex ≥ 1 + x, ∀x ∈ R.
10. What is the probability that in a random group of 10 people at least two have
the same birthday?
11. Consider the experiment:
E = Two fair die are cast and the sum of the spots shown on the uppermost
faces are recorded, with sample space

 = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

Prove directly that the elementary events (singleton subsets) are not equally
likely by repeating E a large number of times and computing the relative
frequencies of the singleton subsets of .
12. Let S = {0, 1, 2, . . . , 10}. Compute the binomial distribution function fξ10 :
S → [0, 1], assuming that the success probability is 12 . Illustrate the probability
distribution using a bar graph.
13. Let X :  → {s1 , s2 , s3 }, Y :  → {t1 , t2 , t3 } be random variables. Suppose
that the joint probability distribution of the random vector (X, Y ) is given by
the table

t1 t2 t3
s1 1/18 1/9 1/6
s2 1/9 1/9 1/18
s3 1/18 1/18 5/18

Compute the following probabilities:


(a) Pr(X = s3 ∩ Y = t1 )
(b) Pr(X = s2 )
(c) Pr(Y = t1 )
(d) Pr(X = s2 |Y = t1 )
(e) Pr(X = s1 ∪ Y = t3 )
Chapter 3
Information Theory and Entropy

3.1 Entropy

Let (, A, Pr) be a fixed probability space. Let S = {s1 , s2 , s3 , . . . , sn } be a finite


set, and let X :  → S be a random variable with distribution function fX : S →
[0, 1], fX (si ) = Pr(X = si ).
Definition 3.1.1 Let X :  → S be a random variable with probability distribution
function fX : S → [0, 1]. Then the entropy of X is


n
H (X) = − fX (si ) log2 (fX (si )).
i=1

Entropy is a measure of the amount of information in the random variable X. To


see that entropy is a measure of information, we look at a few examples.
Suppose Alice and Bob want to share a single bit, 0 or 1. In the first scenario,
Alice always chooses 0 and Bob knows that this will always be her choice. In this
case, when Alice sends 0 to Bob, there is no information conveyed from Alice
to Bob. From Bob’s viewpoint, there is nothing noteworthy, interesting, new, or
surprising from this communication with Alice; he knows that Alice will always
send 0, and she does send 0.
The first scenario can be modeled by the random variable X :  → {0, 1},
where fX (0) = Pr(X = 0) = 1 and fX (1) = Pr(X = 1) = 0. According to
Definition 3.1.1, the entropy of X is

H (X) = −fX (0) · log2 (fX (0)) − fX (1) · log2 (fX (1))
= −1 · log2 (1) − 0 · log2 (0)
= 0.

© Springer Nature Switzerland AG 2022 29


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_3
30 3 Information Theory and Entropy

(Note: we take the value of 0 · log2 (0) to be 0.) The computed value H (X) = 0
is consistent with what we expect in this scenario; it matches what we intuitively
expect to be the amount of information in X, i.e., no information.
In the second scenario, Alice chooses 0 exactly half of the time and chooses 1
exactly half of the time. Bob has no idea which choice Alice will make beforehand.
When Bob receives Alice’s bit, he has received some information; he is aware of
something new or interesting that he did not know before: the value of Alice’s bit.
In this case, information is conveyed by Alice in her communication with Bob. But
how much information?
This second scenario is modeled by the random variable Y :  → {0, 1}, with
fX (0) = Pr(X = 0) = 1/2 and fX (1) = Pr(X = 1) = 1/2. We have

H (Y ) = −fX (0) · log2 (fX (0)) − fX (1) · log2 (fX (1))


   
1 1 1 1
= − · log2 − · log2
2 2 2 2
= 1.

Now, the computed value H (Y ) = 1 is consistent with what we expect in this


scenario; it matches what we intuitively expect to be the amount of information in
Y : 1 bit of information is conveyed by Alice.
Whether Alice’s bit is 0 or 1 does not matter; regardless, there is 1 bit of
information in Y .
These two scenarios show that information is measured in bits.
Here is another interpretation of the entropy formula. Let X be a discrete random
variable taking values in the set S = {s1 , s2 , . . . , sn }. If an event (X = s1 ) has high
probability of occurrence, then we should expect it to contain a small amount of
information (it is not surprising or noteworthy when it occurs). On the other hand,
if an event (X = si ) has a low probability, then we expect it to be attached to a large
amount of information (it is surprising or noteworthy when it occurs).
Observe that the function log2 ( x1 ) = − log2 (x) is strictly decreasing on the
interval (0, 1]; see Figure 3.1.
With x = Pr(X = si ), the value

− log2 (Pr(X = si )) = − log2 (fX (si ))

is thus suited to represent the amount of information in the event (X = si ) as


measured in bits. The entropy of the random variable


n
H (X) = − fX (si ) log2 (fX (si ))
i=1

is the average information in X.


3.1 Entropy 31

Fig. 3.1 Graph of y = − log2 (x) on the interval (0, 1]

Here is an example. Suppose 1000 raffle tickets are sold for $1.00 each to
1000 different people, including Alice. One winner is selected at random from
the 1000 ticket holders. The raffle can be modeled by the random variable X :
 → {Alice wins, Alice does not win}. Now, Pr(X = Alice wins) = 1/1000
and Pr(X = Alice does not win) = 999/1000.
The amount of information in the event (X = Alice wins) is

− log2 (1/1000) = log2 (1000) = 9.96578 bits.

In Alice’s viewpoint, this is a large amount of information; it would be quite


newsworthy and surprising to Alice if she won. The amount of information in the
event (X = Alice does not win) is

− log2 (999/1000) = log2 (1000/999) = 0.001443 bits.

This is a small amount of information, which makes sense since it is expected that
Alice will not win. Overall, the average amount of information in the raffle X is

1 999
· 9.96578 + · 0.001443 = 0.011407 bits.
1000 1000

3.1.1 Entropy and Randomness: Jensen’s Inequality

Given a discrete random variable X taking values in the finite set S, the amount
of information in X is the entropy H (X) in bits. H (X) is also the measure of the
32 3 Information Theory and Entropy

degree of randomness in the distribution fX : S → [0, 1]. To support this notion,


we compute the entropy of two special distributions.
Example 3.1.2 (Minimal Entropy) Let X be a discrete random variable taking
values in the set S = {s1 , s2 , · · · , sn }, n ≥ 1. Suppose that fX (s1 ) = 1 and
fX (si ) = 0 for 2 ≤ i ≤ n. There is no randomness or uncertainty in the distribution:
it is certain that the output will be s1 .
We calculate the entropy to obtain


n
H (X) = − fX (si ) log2 (fX (si ))
i=1


n
= −1 · log2 (1) + 0 · log2 (0)
i=2
= 0.

The result H (X) = 0 is consistent with our observation that X contains no


randomness; the amount of randomness in X is 0. 

Example 3.1.3 (Maximal Entropy) Let X be a discrete random variable taking


values in the set S = {s1 , s2 , · · · , sn }, n ≥ 1. Suppose that X is the uniform random
variable, i.e., suppose that fX (si ) = n1 for 1 ≤ i ≤ n. Intuitively, there is a large
amount of randomness or uncertainty in the distribution: each outcome is equally
likely to occur; in fact, the randomness of the uniform random variable is maximal.
We calculate the entropy to obtain


n
H (X) = − fX (si ) log2 (fX (si ))
i=1


n  
1 1
=− · log2
n n
i=1
= log2 (n).

The result H (X) = log2 (n) is consistent with our observation that X contains a
large amount of randomness.

All of the other random variables X taking values in S = {s1 , s2 , · · · sn } have
entropy (randomness) H (X) between these extremal values; we show that

0 ≤ H (X) ≤ log2 (n).


3.1 Entropy 33

Fig. 3.2 Graph of convex


function y = f (x); we have
f (λx + (1 − λ)y) ≤
λf (x) + (1 − λ)f (y)

To this end, we introduce Jensen’s Inequality.


A function f : R → R is convex on the interval (a, b) if for all x, y ∈ (a, b),
0 < λ < 1,

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).

For instance, f (x) = 2x is convex on (−∞, ∞), cf. Figure 3.2.


A function f (x) is concave on (a, b) if −f (x) is√convex on (a, b). For example,
f (x) = log2 (x) is concave on (0, ∞) as is f (x) = x.
Proposition 3.1.4 (Jensen’s Inequality) Let f : R → R be convex on (0, ∞).
Suppose that ni=1 ai = 1 for ai > 0, 1 ≤ i ≤ n, and that xi > 0 for 1 ≤ i ≤ n.
Then


n 
n
ai f (xi ) ≥ f ( ai xi ).
i=1 i=1

Proof Our proof is by induction on n, with the trivial case being n = 2.


Trivial Case n = 2 Assume that a1 + a2 = 1, ai > 0, thus a2 = 1 − a1 . Since f is
convex on (0, ∞),

a1 f (x1 ) + a2 f (x2 ) ≥ f (a1 x1 + a2 x2 ),

for x1 , x2 ∈ (0, ∞). So the trivial case holds.


34 3 Information Theory and Entropy

Induction Step Assume that Jensen’s inequality holds for n and suppose that
n+1
i=1 ai = 1, ai > 0. Then for xi > 0, 1 ≤ i ≤ n + 1,
n+1   
 
n
f ai xi =f an+1 xn+1 + ai xi
i=1 i=1

1  n
= f (an+1 xn+1 + (1 − an+1 ) ai xi )
1 − an+1
i=1
 
1  n
≤ an+1 f (xn+1 ) + (1 − an+1 )f ai xi ,
1 − an+1
i=1
 n 
 ai
= an+1 f (xn+1 ) + (1 − an+1 )f xi ,
1 − an+1
i=1

n ai
by the trivial step. Since i=1 1−an+1 = 1,
 n 
 ai 
n
ai
f xi ≤ f (xi ),
1 − an+1 1 − an+1
i=1 i=1

and so

 n 
 ai
an+1 f (xn+1 ) + (1 − an+1 )f xi
1 − an+1
i=1


n
ai
≤ an+1 f (xn+1 ) + (1 − an+1 ) f (xi )
1 − an+1
i=1


n
= an+1 f (xn+1 ) + ai f (xi )
i=1


n+1
= ai f (xi ),
i=1

which completes the induction proof.




Proposition 3.1.5 Let X be a discrete random variable taking values in the set
S = {s1 , s2 , . . . , sn }. Then 0 ≤ H (X) ≤ log2 (n).
3.2 Entropy of Plaintext English 35

Proof Note that ni=1 fX (si ) = 1 with fX (si ) = Pr(X = si ) ≥ 0 for 1 ≤ i ≤ n.


Thus H (X) ≥ 0. Moreover,


n
−H (X) = fX (si ) log2 (fX (si ))
i=1


n
1
= fX (si )g( ),
fX (si )
i=1

where g(x) = − log2 (x) is convex on (0, ∞). Now by Proposition 3.1.4,
 n 
 1
−H (X) ≥ g fX (si ) = g(n) = − log2 (n),
fX (si )
i=1

and hence H (X) ≤ log2 (n). 



Example 3.1.6 Let X be a random variable taking values in S = {s1 , s2 , s3 , s4 } with
probability distribution
1 1 1 4
fX2 (s1 ) = , fX2 (s2 ) = , fX2 (s3 ) = , fX2 (s4 ) = .
4 4 10 10
Then


4
H (X) = − fX2 (si ) log2 (fX2 (si ))
i=1
      
1 1 1 1 1 1
=− log2 + log2 + log2
4 4 4 4 10 10
 
4 4
+ log2
10 10
= 1.86096.

Note that a uniform random variable taking values in S has maximum entropy
log2 (4) = 2.


3.2 Entropy of Plaintext English

English plaintext consists of words over an alphabet A of 26 letters:

A = {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z}.
36 3 Information Theory and Entropy

Let n ≥ 0. An n-gram is a sequence of n letters in A. For example, IBM is a 3-


gram, but so is XVQ. There is only one 0-gram, the empty word of length 0, usually
denoted by ε. For n ≥ 0, let Ln denote the collection of all n-grams over the alphabet
A. Then L0 = {ε},
L1 = {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z}
L2 = {AA,AB,AC,AD,AE,AF,AG,AH,AI,AJ,AK, AL,AM,AN,AO,AP,AQ,AR,AS
AT,AU,AV,AW,AX,AY,AZ,BA,BB,BC,BD,BE,BF,BG,BH,BI,BJ,BK, BL
BM,BN,BO,BP,BQ,BR,BS,. . . ,ZX,ZY,ZZ}
L3 = {AAA,AAB, AAC,. . . ,ZXZ, ZYZ,ZZZ}, and so on.
Note that |L0 | = 1, |L1 | = 26, |L2 | = 262 = 676, |L3 | = 263 = 17576, and in
general, for n ≥ 0,

|Ln | = 26n .

For n ≥ 1, let En be the nondeterministic experiment: “an n-gram is observed


in English plaintext and it is recorded.” Choose Ln for the sample space of En . Let
w ∈ Ln and let {w} be the event: the n-gram w appears in English plaintext.
We repeat experiment En many times and compute the relative frequency fn (w)
that event {w} occurs. We do this by obtaining a very large sample T of English
plaintext. Then

number of times that w occurs in T


fn (w) = .
number of n-grams in T

In this manner, for each n ≥ 1, we obtain the n-gram relative frequency


distributions:

fn : Ln → [0, 1].

For example, if n = 1, we get the 1-gram relative frequency distribution:

f1 (A), f1 (B), f1 (C), . . . , f1 (Z).

In fact, f1 has been calculated for a large sample of (typical) English plaintext T
(Figure 3.3).
In table form, the distribution f1 is given as
The largest 2-gram relative frequencies are given in Figure 3.4. (For a complete
listing of all 676 2-gram relative frequencies, see [35, Table 2.3.4]).
For n ≥ 1, let Hn denote the entropy of the n-gram relative frequency
distribution; that is,

Hn = − fn (w) log2 (fn (w)).
w∈Ln
3.2 Entropy of Plaintext English 37

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Fig. 3.3 1-gram relative frequency distribution f1 : L1 → [0, 1]

Letter Prob. Letter Prob.


A 0.0804 N 0.0709
B 0.0154 O 0.0760
C 0.0306 P 0.0200
D 0.0399 Q 0.0011
E 0.1251 R 0.0612
F 0.0230 S 0.0654
G 0.0196 T 0.0925
H 0.0549 U 0.0271
I 0.0726 V 0.0099
J 0.0016 W 0.0192
K 0.0067 X 0.0019
L 0.0414 Y 0.0173
M 0.0253 Z 0.0009

Using the distribution of Figure 3.3, one computes 1-gram entropy to be



H1 = − f1 (w) log2 (f1 (w)) = 4.14.
w∈L1

Employing the 2-gram relative frequency distribution f2 : L2 → [0, 1] [35,


Table 2.3.4], one obtains

H2 = − f2 (w) log2 (f2 (w)) = 7.12.
w∈L2
38 3 Information Theory and Entropy

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0
AN AT ED EN ER ES HE IN ON OR RE ST TE TH TI

Fig. 3.4 Largest 2-gram relative frequencies f2 : L2 → [0, 1]

For each n ≥ 1, we let Hn /n denote the n-gram (relative frequency distribu-


tion) entropy per letter (entropy rate). Consider the sequence of entropy rates:

H1 /1, H2 /2, H3 /3, . . . .


Hn
Shannon [54, 55] has found that the limit limn→∞ n exists and has determined its
value to be
Hn
H∞ = lim ≈ 1.5 bits per letter.
n→∞ n
This says that on average English plaintext contains 1.5 bits of information per
letter; when we write English it is as if we are using only 21.5 ≈ 3 characters
with equal probability of occurrence. Said differently, English plaintext is quite
redundant; we need log2 (26) = 4.70044 bits to represent each letter, yet each letter
contains only 1.5 bits of information.
From Example 3.1.3, we see that the maximum amount of information per letter
in a language of words over an alphabet of 26 letters is

log2 (26) = 4.70044.

The redundancy per letter of plaintext English is thus defined to be the difference
between the maximum information rate and the information rate of English:

4.7 − 1.5 = 3.2 bits per letter.


3.2 Entropy of Plaintext English 39

Since the minimum entropy per letter is 0, the maximum redundancy rate per
letter is 4.7. The redundancy ratio of plaintext English is therefore

3.2
= 68.09%.
4.7
This says that plaintext English can be compressed by 68% without losing any
information.

3.2.1 ASCII Encoding

In order to process data containing plaintext English, a digital computer must


first convert English into a finite sequence of bits or a “block” of bits. The
standard way to do this is to use ASCII (American Standard Code for Infor-
mation Interchange). ASCII is an alphabet consisting of 28 = 256 characters
numbered from 0 to 255 and written in base 2 as bytes (8-bit strings). The
uppercase letters A,B,. . . ,Z correspond to numbers 65, 66, . . . 90 written as bytes
01000001, 01000010, . . . , 01011010. Thus,

A ↔ 01000001, B ↔ 01000010, C ↔ 01000011,

and so on.
The lowercase letters a–z correspond to numbers 97–122 written as bytes
01100001–01111010.
We want to compute the redundancy rate and redundancy ratio of English when
it is encoded in ASCII. We assume that English is written in uppercase only.
Let denote the ASCII alphabet, and let Ln denote the collection of all n-grams
over , n ≥ 1. For instance,

L1 = {00000000, 00000001, 00000010, . . . , 11111111},

L2 = {00000000 00000000, 00000000 00000001, 00000000 00000010,

. . . , 11111110 11111111, 11111111 11111111}.


The collection of ordinary 26 uppercase letters of English (encoded as bytes) is
a proper subset of L1 ; the collection of 262 = 676 2-grams of English letters is a
proper subset of L2 , and so on.
Let f1 : L1 → [0, 1] denote the 1-gram relative frequency distribution of
plaintext English encoded in ASCII as bytes. We may assume that f1 (b) = 0 if
b does not correspond to an uppercase English letter. Otherwise, if b corresponds to
an uppercase English letter, then f1 (b) is equal to the accepted relative frequency of
that letter as given in Figure 3.3. Thus f1 (01000001) = 0.0804, f1 (01000010) =
0.0154, and so on.
40 3 Information Theory and Entropy

The 2-gram relative frequency distribution f2 : L2 → [0, 1] is given similarly


using the accepted relative frequency distribution of 2-grams, see Figure 3.4 and
[35, Table 2.3.4].
Now,

H1 = − f1 (w) log2 (f1 (w)) = 4.14
w∈L1

and

H2 = − f2 (w) log2 (f2 (w)) = 7.12,
w∈L2

as computed previously. Moreover,

Hn
H∞ = lim ≈ 1.5 bits per letter.
n→∞ n

Of course, now a letter is a byte.


Since | | = 28 , we find that the maximum information rate is log2 (28 ) = 8, and
so the redundancy rate of English as encoded in ASCII is

8 − 1.5 = 6.5 bits per letter.

The maximum redundancy rate is 8 and so the redundancy ratio of ASCII encoded
English is

6.5
= 81.24% > 68.09%.
8
We conclude that encoding English in ASCII increases the redundancy ratio of
English.

3.3 Joint and Conditional Entropy

In this section we include some technical material that is needed in our discussion
of unicity distance (Section 3.4).
Let (, A, Pr) be an abstract probability space, and let X :  → S, Y :
 → T , S = {s1 , s2 , . . . , sm }, T = {t1 , t2 , . . . , tn } be random variables. The joint
probability distribution for X, Y is

pi,j = Pr(X = si ∩ Y = tj ),
3.3 Joint and Conditional Entropy 41

for 1 ≤ i ≤ m, 1 ≤ j ≤ n The joint entropy of X,Y is


m 
n
H (X, Y ) = − pi,j log2 (pi,j ).
i=1 j =1

The conditional probability distribution for X = si given Y = tj is

Pr(X = si |Y = tj ),

for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
For fixed j , the conditional entropy of X given Y = tj is


m
H (X|Y = tj ) = − Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj )),
i=1

and the conditional entropy of X given Y is


n
H (X|Y ) = Pr(Y = tj )H (X|Y = tj )
j =1


m 
n
=− Pr(Y = tj ) Pr(X = si |Y = tj )
i=1 j =1

· log2 (Pr(X = si |Y = tj )).

The conditional entropy of X given Y is an average conditional entropy.


Proposition 3.3.1 H (X) ≥ H (X|Y ), with equality holding if and only if X and Y
are independent.
Proof We show that H (X) − H (X|Y ) ≥ 0. By the Total Probability Theorem
(Proposition 2.2.1), nj=1 Pr(Y = tj |X = si ) = 1. Thus


m
H (X) = − Pr(X = si ) log2 (Pr(X = si ))
i=1


m 
n
=− Pr(X = si ) Pr(Y = tj |X = si ) log2 (Pr(X = si )).
i=1 j =1
42 3 Information Theory and Entropy

Thus H (X) − H (X|Y )


m 
n
=− Pr(X = si ) Pr(Y = tj |X = si ) log2 (Pr(X = si ))
i=1 j =1


m 
n
+ Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj ))
i=1 j =1


m 
n
= Pr(X − si ∩ Y = sj )(log2 (Pr(X = si |Y = tj ))
i=1 j =1

− log2 (Pr(X = si )))



m 
n  
Pr(X = si |Y = tj )
= Pr(X − si ∩ Y = sj ) log2
Pr(X = si )
i=1 j =1


m 
n  
Pr(X = si ∩ Y = tj )
= Pr(X − si ∩ Y = sj ) log2 .
Pr(X = si ) Pr(Y − tj )
i=1 j =1


m 
n  
Pr(X = si ) Pr(Y − tj )
= Pr(X − si ∩ Y = sj )g ,
Pr(X = si ∩ Y = tj )
i=1 j =1

where g(x) = − log2 (x). Now by Proposition 3.1.4,


⎛ ⎞

m 
n
Pr(X = si ) Pr(Y − tj )
H (X) − H (X|Y ) ≥ g ⎝ Pr(X − si ∩ Y = sj ) ⎠
Pr(X = si ∩ Y = tj )
i=1 j =1
⎛ ⎞

m 
n
= g⎝ Pr(X = si ) Pr(Y − tj )⎠
i=1 j =1

= g(1)
= 0.



The non-negative quantity H (X) − H (X|Y ) is the mutual information between
X and Y , denoted as I (X; Y ). It is the reduction in the randomness of X due to the
knowledge of Y ; I (X; Y ) = 0 if and only if X and Y are independent.
Example 3.3.2 Let X and Y be the random variables of Example 2.5.1. Then
I (X; Y ) = 0 since the random variables are independent. 
3.3 Joint and Conditional Entropy 43

Example 3.3.3 Let X and Y be the random variables of Example 2.5.2. Then

H (X) = log2 (6),


6
H (X|Y = odd) = − Pr(X = i|Y = odd) log2 (Pr(X = i|Y = odd))
i=1
= log2 (3),


6
H (X|Y = even) = − Pr(X = i|Y = even) log2 (Pr(X = i|Y = even))
i=1
= log2 (3),

and

H (X|Y ) = Pr(Y = odd)H (X|Y = odd) + Pr(Y = even)H (X|Y = even)


1 1
= log2 (3) + log2 (3)
2 2
= log2 (3).

And so the amount of mutual information between X and Y is

I (X; Y ) = log2 (6) − log2 (3) = log2 (2) = 1.

Proposition 3.3.4 (The Chain Rule) H (X, Y ) = H (Y ) + H (X|Y ).


Proof By the Total Probability Theorem (Proposition 2.2.1),
m
i=1 Pr(X = si |Y = tj ) = 1. Thus


n
H (Y ) = − Pr(Y = tj ) log2 (Pr(Y = tj ))
j =1


m 
n
=− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(Y = tj )).
i=1 j =1
44 3 Information Theory and Entropy

And so, H (Y ) + H (X|Y )


m 
n
=− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(Y = tj ))
i=1 j =1


m 
n
− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj ))
i=1 j =1


m 
n
=− Pr(Y = tj ) Pr(X = si |Y = tj )
i=1 j =1

· (log2 (Pr(Y = tj ) + log2 (Pr(X = si |Y = tj )))



m 
n
=− Pr(Y = tj ) Pr(X = si |Y = tj )
i=1 j =1

· log2 (Pr(Y = tj ) Pr(X = si |Y = tj ))


= H (X, Y ).



Proposition 3.3.5 H (X) + H (Y ) ≥ H (X, Y ), with equality holding if and only if
X and Y are independent.
Proof By Proposition 3.3.1, H (X) ≥ H (X|Y ). Thus by Proposition 3.3.4, H (X) ≥
H (X, Y ) − H (Y ). 


3.4 Unicity Distance

Suppose Alice is communicating with Bob using the symmetric cryptosystem

M, C, e, d, K.

Here M is plaintext English, C is the collection of all possible ciphertext messages,


and K is the keyspace, which is the collection of all possible keys used for
encryption and decryption.
Malice would like to break the cryptosystem with a ciphertext only attack, using
the brute-force method of key trial (Section 1.3), i.e., Malice will compute d(C, k)
for every possible key k.
Malice wonders: how much ciphertext needs to be captured in order to uniquely
determine the key? What is the minimal length of ciphertext that guarantees that
there are no spurious keys?
3.4 Unicity Distance 45

As we saw in Example 1.3.1, Malice’s interception of 4 characters of ciphertext


Egnw is enough to determine the key k = 18. On the other hand, in Example 1.3.2,
Malice’s knowledge of 3 characters of ciphertext MFQ is not enough to uniquely
determine the key, though Malice has learned that the key must be either 4, 5, 12, or
13. The correct key is k = 4 and 5, 12, and 13 are spurious keys.
Essentially, Malice wants to know how much ciphertext to capture so that the
number of spurious keys goes to 0. Our goal in this section is to obtain a lower bound
for this value; this lower bound is called the unicity distance of the cryptosystem.
We need to introduce some formal notation.
Let (, A, Pr) be an abstract probability space. Let n ≥ 1 and let Mn :  →
Ln ∩ M, Cn :  → Ln ∩ C, and K :  → K denote random variables. Then

(Mn = M) = {ω ∈  : Mn (ω) = M},

M ∈ Ln ∩ M, denotes the event “the plaintext message is the n-gram M”;


Pr(Mn = M) is the probability that event (Mn = M) occurs, i.e., Pr(Mn = M)
is the probability that the plaintext message is the n-gram M.
Likewise,

(Cn = C) = {ω ∈  : Cn (ω) = C},

C ∈ Ln ∩ C, denotes the event “the ciphertext is the n-gram C”; Pr(Cn = C) is


the probability that event (Cn = C) occurs. In other words, Pr(Cn = C) is the
probability that the ciphertext is the n-gram C. Finally,

(K = k) = {ω ∈  : K(ω) = k},

k ∈ K, denotes the event “Alice and Bob have chosen k as the encryption–decryption
key”; Pr(K = k) is the probability that the encryption–decryption key is k.
Let C ∈ Ln ∩ C be ciphertext of length n and let

W (C) = {k ∈ K : d(C, k) ∈ M}.

Then W (C) is the set of keys k for which the decryption of C with k results in
a meaningful or legible word (or words) in plaintext. Clearly, |W (C)| ≥ 1, since
there is always a correct key k with M = d(C, k), where M is the original intended
plaintext.
A spurious key is a key k ∈ K for which d(C, k) = M ∈ M, where M
is legitimate plaintext other than the original, intended plaintext. The number of
spurious keys is |W (C)| − 1.
For instance, in Example 1.3.1, W (Egnw) = {18} and there are no spurious keys.
In Example 1.3.2, W (MFQ) = {4, 5, 12, 13} and {5, 12, 13} are spurious keys.
46 3 Information Theory and Entropy

The average number of spurious keys over all C ∈ Ln ∩ C is



Spur(n) = Pr(Cn = C)(|W (C)| − 1).
C∈Ln ∩C

Intuitively,

lim Spur(n) = 0,
n→∞

and thus (theoretically) there exists a smallest integer n0 ≥ 1 for which

Spur(n0 ) = 0.

Thus, n0 is the smallest positive integer so that

|W (C)| − 1 = 0

for all C ∈ Ln0 ∩ C. That is, n0 is the smallest positive integer so that for each C ∈ C
of length n0 , there is a unique key k that maps C back to a message M ∈ M.
We seek a lower bound for n0 ; this lower bound will be the unicity distance of
the cryptosystem.
We first find a lower bound for Spur(n). To this end, we prove two lemmas.
Lemma 3.4.1 H (K, Cn ) = H (K) + H (Mn ).
Proof We view (K, Mn ) as a 2-dimensional random variable. By Proposition 3.3.4,

H (Cn , (K, Mn )) = H (K, Mn ) + H (Cn |(K, Mn )).

We have H (Cn |(K, Mn )) = 0 because the ciphertext is determined by the key


and the plaintext. Thus, H (Cn , (K, Mn )) = H (K, Mn ). Of course, K and Mn are
independent, and so,

H (Cn , (K, Mn )) = H (K) + H (Mn )

by Proposition 3.3.5.
We next view (K, Cn ) as a 2-dimensional random variable. By Proposition 3.3.4,

H (Mn , (K, Cn )) = H (K, Cn ) + H (Mn |(K, Cn )).

We have H (Mn |(K, Cn )) = 0 because knowledge of the key and the ciphertext
yields the correct plaintext (the cryptosystem works). Thus,

H (Mn , (K, Cn )) = H (K, Cn ).

The result follows. 



3.4 Unicity Distance 47

The key equivocation is the conditional entropy H (K|Cn ). Key equivocation is


a measure of the randomness of the key given that an n-gram of cyphertext has been
provided.
Lemma 3.4.2 H (K|Cn ) = H (K) + H (Mn ) − H (Cn ).
Proof By the Chain Rule (Proposition 3.3.4), H (K|Cn ) + H (Cn ) = H (K, Cn ). By
Lemma 3.4.1, H (K|Cn ) + H (Cn ) = H (K) + H (Mn ), which gives the result.  
Here is the lower bound on Spur(n).
Proposition 3.4.3 Let Spur(n) be the average number of spurious keys over all
C ∈ Ln ∩ C. Then

|K|
Spur(n) ≥ − 1,
23.2n
for n ≥ 1.
Proof For n ≥ 1,

Spur(n) = Pr(Cn = C)(|W (C)| − 1)
C∈Ln ∩C
 
= Pr(Cn = C)|W (C)| − Pr(Cn = C)
C∈Ln ∩C C∈Ln ∩C

= Pr(Cn = C)|W (C)| − 1,
C∈Ln ∩C

so that

Spur(n) + 1 = Pr(Cn = C)|W (C)|,
C∈Ln ∩C

thus
⎛ ⎞

log2 (Spur(n) + 1) = log2 ⎝ Pr(Cn = C)|W (C)|⎠ .
C∈Ln ∩C

By Proposition 3.1.4,

log2 (Spur(n) + 1) ≥ Pr(Cn = C) log2 (|W (C)|).
C∈Ln ∩C
48 3 Information Theory and Entropy

An application of Proposition 3.1.5 yields log2 (|W (C)|) ≥ H (K|Cn = C) (since


K|(Cn = C) is a random variable taking values in W (C)), thus

log2 (Spur(n) + 1) ≥ Pr(Cn = C)H (K|Cn = C).
C∈Ln ∩C

Now,

log2 (Spur(n) + 1) ≥ Pr(Cn = C)H (K|Cn = C)
C∈Ln ∩C

= H (K|Cn )
= H (K) + H (Mn ) − H (Cn ),

by Proposition 3.4.2.
In Section 3.2, we computed

Hn H (Mn )
H∞ = lim = lim .
n→∞ n n→∞ n

Thus, for large n, H (Mn ) ≈ nH∞ . Moreover, we may assume that

H (K) = log2 (|K|).

Consequently,

H (K) + H (Mn ) − H (Cn ) ≈ log2 (|K|) + nH∞ − H (Cn ).

Now by Proposition 3.1.5,

log2 (|K|) + nH∞ − H (Cn ) ≥ log2 (|K|) + nH∞ − n log2 (26)


= log2 (|K|) − n(log2 (26) − H∞ ).

Thus,

log2 (Spur(n) + 1) ≥ log2 (|K|) − n(log2 (26) − H∞ ),

and so,

|K| |K|
Spur(n) ≥ = 3.2n − 1
2n(log2 (26)−H∞ ) 2

(recall H∞ = 1.5, as computed in Section 3.2). 



3.4 Unicity Distance 49

By definition, n0 ≥ 1 is the smallest integer for which Spur(n0 ) = 0. To obtain a


lower bound for n0 , we find the unique m ∈ R for which

|K|
0= − 1,
23.2m
thus
log2 (|K|)
m= . (3.1)
3.2
We then have

n0 ≥ m,

for if not, then

|K|
Spur(n0 ) ≥ − 1 > 0.
23.2n0
Definition 3.4.4 Let M, C, e, d, K be a symmetric key cryptosystem that
encrypts plaintext English. The value

log2 (|K|)
m=
3.2
is the unicity distance of the cryptosystem.
This value for m is the best lower bound possible for the smallest integer n0 with
Spur(n0 ) = 0. Any integer larger than m would not be optimal, and an integer m
smaller than m would result in the inequality

|K|
Spur(m ) ≥  − 1 > 0,
23.2m
indicating at least one spurious key.
So, if |K| = 26 (as in our right shift cryptosystem), we compute

log2 (26)
unicity distance =
3.2
4.7
=
3.2
= 1.47 characters of ciphertext.

For the shift cipher, we therefore deduce that n0 ≥ 1.47, and as we have seen in
Example 1.3.2, the actual value for n0 is larger than 1.47.
50 3 Information Theory and Entropy

Now suppose Alice and Bob are using the simple substitution cryptosystem (see
Example 8.1.2) with 26! ≈ 4 · 1026 keys. Then the unicity distance is

log2 (26!)
≈ 28 characters of ciphertext.
3.2
Thus for the simple substitution cryptosystem, we conclude that n0 ≥ 28; again,
the exact value for n0 might be significantly larger than 28. If Malice captures 28
characters of ciphertext C and performs a brute-force key trial, there may still be
spurious keys for C.
On the other hand, suppose that the plaintext M consists of words from a
language with a redundancy of 0 bits per letter. Then, assuming |K| finite,

log2 (|K|)
unicity distance = = ∞.
0
Consequently, n0 = ∞, and it is impossible to uniquely determine the key using
key trial.
Since English has redundancy > 0, every cryptosystem with a finite keyspace
has a finite unicity distance.

3.5 Exercises

1. Suppose that Alice and Bob wish to exchange a single bit, either 0 or 1. The
probability that Alice selects 0 is 3/4, and the probability that Alice selects 1
is 1/4. After choosing her bit, Alice then sends the bit to Bob. We model this
exchange using a random variable X :  → {0, 1}, where fX (0) = Pr(X =
0) = 3/4 and fX (1) = Pr(X = 1) = 1/4.
Compute the entropy H (X) of the random variable X, i.e., compute the
amount of information in the transmission of Alice’s bit.
2. Suppose that Alice and Bob wish to exchange one of the four letters, either a, b,
c, or d. The probability that Alice selects a is 9/10, the probability that Alice
selects b is 1/30, the probability that Alice selects c is 1/30, and the probability
that Alice selects d is 1/30.
After choosing her letter, Alice then sends the letter to Bob. We model this
exchange using a random variable X :  → {a,b,c,d}, where fX (a) = Pr(X =
a) = 9/10, fX (b) = Pr(X = b) = 1/30, fX (c) = Pr(X = c) = 1/30, and
fX (d) = Pr(X = d) = 1/30.
Compute the amount of information (entropy) in the random variable X.
3. Let S = {0, 1, 2, 3, 4, 5, 6}, and let ξ6 : 6 → S denote the binomial random
variable with binomial distribution function fξ6 : S → [0, 1] defined as
   k  6−k
6 1 2
fξ6 (k) = .
k 3 3
3.5 Exercises 51

(Note that the probability of “success” is 13 .) Compute the entropy H (ξ6 ) of the
random variable ξ6 .
4. Let f1 : L1 → [0, 1] be the 1-gram relative frequency distribution of plaintext
English, as given in Section 3.2. Let

H1 = − f1 (w) log2 (f1 (w))
w∈L1

denote the entropy of the distribution. Write a computer program that computes
H1 .
5. Let r be a real number, 0 ≤ r ≤ 1. Let X :  → {s1 , s2 } be a random variable
with fX (s1 ) = p, fX (s2 ) = 1 − p, for some p, 0 ≤ p ≤ 1. Show that there
exists a value for p so that H (X) = r.
6. Let X :  → {s1 , s2 , s3 } and Y :  → {t1 , t2 , t3 } be random variables. Suppose
that the joint probability distribution of the random vector (X, Y ) is given by the
table

t1 t2 t3
s1 1/18 1/9 1/6
s2 1/9 1/9 1/18
s2 1/18 1/18 5/18

Compute the following:


(a) H (X)
(b) H (X|Y = t1 )
(c) H (X|Y )
(d) The mutual information, I (X; Y )
7. Suppose plaintext English is encrypted using the cryptosystem M, C, e, d, K
where |K| = 4096. Compute the unicity distance of this cryptosystem.
8. Suppose the redundancy of natural language L is 1.2 bits per letter. Plaintext
messages written in L are encrypted using cryptosystem M, C, e, d, K. In
which scenario below does Malice have a better chance of defeating the
cryptosystem with a ciphertext only attack (assuming infinite computing power)?
Scenario 1: Malice intercepts 20 characters of ciphertext encrypted using a
cryptosystem with |K| = 33554432.
Scenario 2: Malice intercepts 10 characters of ciphertext encrypted using a
cryptosystem with |K| = 1024.
9. Suppose that the plaintext English alphabet = {A, B, C, . . . , Z} is encoded
in as 5-bit blocks as follows: A ↔ 00000, B ↔ 00001, C ↔ 00010,. . . ,
Z ↔ 11001. As a result of this encoding, plaintext English can be viewed as
a collection of words over the alphabet {0, 1}; an n-gram is a bit string of length
n.
52 3 Information Theory and Entropy

(a) Compute the information rate per bit of plaintext English encoded as above,
Hn
i.e., compute lim , where Hn is the entropy of the n-gram relative
n→∞ n
H5n
frequency distribution. Hint: we know that lim = 1.5. Use this fact
n→∞ n
H5n Hn
to compute lim , and from this deduce lim .
n→∞ 5n n→∞ n
(b) Use part (a) to find the redundancy rate per bit and the redundancy ratio. How
does this compare with the standard redundancy ratio of 68%?
Chapter 4
Introduction to Complexity Theory

4.1 Basics of Complexity Theory

Suppose Alice and Bob are using the cryptosystem M, C, e, d, Ke , Kd  to com-
municate securely. Recall from Definition 1.2.1 that a cryptosystem is a system

M, C, e, d, Ke , Kd ,

where M is a collection of plaintext messages, C is the corresponding set of


ciphertext, e is an encryption transformation with keyspace Ke , and d is the
decryption transformation with keyspace Kd .
Encryption of messages in M can be viewed as a function f : M → C; the
ciphertext is given as f (M) = C where M ∈ M is a message.
In order for the cryptosystem to be secure it should be very hard for Malice to
decrypt the ciphertext C without knowledge of the decryption key kd . In other words,
without knowledge of kd , it should be hard to find an inverse function f −1 for which
f −1 (C) = M.
Complexity theory seeks to determine the inherent difficulty of a problem by
measuring the amount of computer resources (e.g., time or space) that are required
for a computer to solve a problem.
The problems we are interested in solving involve the computation of function
values, e.g., f (M) = C, f −1 (C) = M. We write algorithms to compute functions.
Generally, an algorithm is a method, expressed as a finite list of well-defined
instructions, which is used to compute a function value.
We want to measure how efficiently an algorithm computes the value of a
function. The running time on input x of an algorithm is the number of steps
the algorithm takes to compute the function value f (x). The running time of an
algorithm usually depends on the size of the input.

© Springer Nature Switzerland AG 2022 53


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_4
54 4 Introduction to Complexity Theory

When evaluating the performance of an algorithm in terms of its running time,


we always consider the worst possible case (we consider inputs which result in
the maximum run time). Moreover, since our algorithms will be implemented on a
digital computer, we assume that the size of the input is given in terms of bits.
Let R denote the set of real numbers. For x ∈ R, let x denote the largest integer
≤ x (x is the floor function). If the input of an algorithm is an integer n ≥ 1, then
its size in terms of bits is

m = log2 (n) + 1.

For example, if the input is n = 17, then its size in terms of bits is

m = log2 (17) + 1 = 4 + 1 = 5 bits,

indeed, (17)2 = 10001. (Note: if n is a decimal integer, we sometimes denote its


binary representation as (n)2 .)
A convenient way to measure the running time of an algorithm is to use order
notation.
Definition 4.1.1 (Order Notation) Let R+ = {x ∈ R : x > 0} denote the set of
positive real numbers. Let f : R+ → R+ and g : R+ → R+ be functions. Then f
is of order g, denoted as

f (x) = O(g(x)),

if there exist real numbers M, N ∈ R+ for which f (x) ≤ Mg(x) for all x ≥ N .
For example, the polynomial function p(n) = n2 + 2n + 3 is of order n2 , that is,
p(n) = O(n2 ), since p(n) ≤ 3n2 for n ≥ 2.
In the next section we consider “polynomial time” algorithms, algorithms whose
running time is of the order of a polynomial, usually of relatively low degree. Such
algorithms are considered to be practical since they can be implemented efficiently
on a digital computer.

4.2 Polynomial Time Algorithms

Definition 4.2.1 An algorithm is a polynomial time algorithm if its running time


t (m) satisfies t (m) = O(p(m)) for some polynomial p(m) with integer coefficients,
where m is the size of the input of the algorithm as measured in bits.
The fundamental operations of arithmetic (+, −, ×, ÷) are given by algorithms
that run in polynomial time, when the input size is measured in bits. This shows that
these operations can be performed efficiently on a computer.
4.2 Polynomial Time Algorithms 55

In the algorithms below we use the notation x ← y to denote that x is assigned


the value y.
Algorithm 4.2.2 (BIN_INT_ADD)
Input: integers a ≥ b ≥ 0 written in binary as a = am · · · a2 a1 ,
b = bm · · · b2 b1
Output: the function value f (a, b) = a + b = 1dm dm−1 · · · d1
or f (a, b) = a + b = dm dm−1 · · · d1 in binary
Algorithm:
c←0
for i = 1 to m
if ai + bi + c = 1 or 3
then di ← 1
else di ← 0
if ai + bi + c ≥ 2
then c ← 1
else c ← 0
next i
if c = 1
then output 1dm dm−1 · · · d1
else output dm dm−1 · · · d1


To see that Algorithm 4.2.2 works, we compute 7 + 5, or in binary, 111 + 101.


In this case, m = 3. We have a1 + b1 + c = 2 and so, d1 = 0 and c = 1. Next,
a2 + b2 + c = 2, and so d2 = 0 and c = 1. Finally, a3 + b3 + c = 3, thus d3 = 1,
c = 1. Since c = 1, we output 1100, which is the correct sum in binary.
Proposition 4.2.3 The running time of Algorithm 4.2.2 is O(m) where m =
log2 (a) + 1.
Proof Recall that m = 1 + log2 (a) is the number of bits in the binary
representation of a ≥ b.
The algorithm performs m iterations of the for-next loop. On each iteration the
algorithm performs one bit operation. Additionally, we have three more steps: c ←
0, checking if c = 1, and the output. Thus the total number of steps is m + 3. Since
m + 3 ≤ 2m for m ≥ 3, the result follows. 

We next consider subtraction. Let b = bm bm−1 . . . b2 b1 be a binary integer, i.e.,
b is a non-negative decimal integer written in binary. The 1’s complement of b is
the binary integer b = bm bm−1 . . . b2 b1 where

0 if bi = 1
bi =
1 if bi = 0.
56 4 Introduction to Complexity Theory

The 2’s complement of b is b +1. We denote the 2’s complement of b as COMP(b).


For example, COMP(10111) = 01000 + 1 = 01001 and COMP(0000) = 10000.
Suppose a = am . . . a1 , b = bm . . . b1 are binary integers with a ≥ b. Then, as
one can check,

a − b = the rightmost m bits of (a + COMP(b)).

For example,

101010 − 010111 = rightmost 6 bits of (101010 + COMP(010111))


= rightmost 6 bits of (101010 + 101001)
= 010011,

which is correct.
Here is an algorithm that computes a − b assuming that a, b are binary integers
with a ≥ b.
Algorithm 4.2.4 (BIN_INT_SUB)
Input: integers a ≥ b ≥ 0 written in binary as a = am · · · a2 a1 ,
b = bm · · · b2 b1
Output: a − b in binary
Algorithm:
c ← COMP(b)
d ←a+c
output dm dm−1 . . . d2 d1

Proposition 4.2.5 The running time of Algorithm 4.2.4 is O(m) where m =


log2 (a) + 1.
Proof Each step of Algorithm 4.2.4 runs in time O(m), thus the total run time is
O(m). 

Let a = am · · · a2 a1 be the binary representation of the (decimal) integer a. Then
adding a to itself (doubling a), denoted as 2a, can be achieved in polynomial time
O(m). The operation 2a is performed by shifting the string a to the left one digit
and appending a 0 at the end.
Here is an algorithm that multiplies two binary integers.
4.2 Polynomial Time Algorithms 57

Algorithm 4.2.6 (BIN_INT_MULT)


Input: integers a ≥ b ≥ 0 written in binary as a = am · · · a2 a1 ,
b = bm · · · b2 b1
Output: ab in binary
Algorithm:
c←0
for i = 1 to m
if bi = 1 then c ← c + a
a ← 2a
next i
output c


To see how this algorithm works, we multiply 3 · 5. In this case a = (3)2 = 011,
b = (5)2 = 101, so m = 3. We compute 5 · 3 = 101 · 011. On the first iteration, c
becomes 011, a is 0110. On the second iteration, c remains 011, a is 01100. On the
third iteration, c is 01111, which is then output as the correct answer, 15.
Proposition 4.2.7 The running time of Algorithm 4.2.6 is O(m2 ) where m =
log2 (a) + 1.
Proof The algorithm performs m iterations of the for-next loop. On each iteration
the algorithm performs at most 2 · O(m) bit operations. Thus the running time is
O(m) · (2 · O(m)) = O(m2 ). 

We next consider division of (decimal) integers. Given integer a, n, a ≥ 0, n > 0,
we write an algorithm that computes a/n.
Algorithm 4.2.8 (a_DIV_n)
Input: integers a ≥ 0, n > 0, with (a)2 = am . . . a2 a1
Output: a/n
Algorithm:
c←0
while a ≥ n
for i = 1 to m
if 2i−1 n ≤ a < 2i n
then a ← a − 2i−1 n and c ← c + 2i−1
next i
end-while
output c

58 4 Introduction to Complexity Theory

To show how Algorithm 4.2.8 works, we use it to compute 8/5 = 1. We have


a = 8, n = 5, and (8)2 = 1000, hence m = 4. Now, with i = 1, we have
5 ≤ 8 < 10, and so a = 3 and c = 1. The next three iterations of the for-next loop
leave a and c unchanged. Since 3 ≥ 5, we output c = 1, as required.
Of course, if a is divided by n using the “long division” algorithm of elementary
arithmetic, we obtain the division statement

a = nq + r,

for unique integers q, r with 0 ≤ r < n. The value of q is precisely a/n.


In Section 5.3.2, we will see that r coincides with the least non-negative residue
(a mod n).
Proposition 4.2.9 The running time of Algorithm 4.2.8 is O(m2 ) where m =
log2 (a) + 1.
Proof The while loop is iterated at most m times and on each iteration we perform
m steps each costing O(1). Thus the running time is O(m2 ). 


4.3 Non-polynomial Time Algorithms

We next discuss algorithms whose running times are not the order of any polyno-
mial. These “non-polynomial” time algorithms cannot be considered practical or
efficient on inputs that are very large.
As an example, we consider an algorithm that decides whether a given integer
n ≥ 2 is prime or composite. In other words, our algorithm computes the function

f : {2, 3, 4, 5, . . . } → {YES,NO}

defined as

YES if n is a prime number
f (n) =
NO if n is a composite number.

The algorithm is based on the following well-known fact from number theory.
Proposition 4.3.1 Let n be a composite number. Then n has an integer factor ≤

n.
Proof Write n = ab for 1 < a < n, 1√< b < n. We√can assume without
√ √loss of
generality that a ≤ b. Suppose that a > n. Then b > n, thus ab > n· n = n,
a contradiction. 


Thus if n has no integer factors ≤ n, then it is prime.
4.3 Non-polynomial Time Algorithms 59

Algorithm 4.3.2 (PRIME)


Input: an integer n ≥ 2
Output: YES if n is prime, NO if n is composite
Algorithm:
d←2
p ← YES √
while p is YES and d ≤ n
if d | n
then p ← NO
else d ← d + 1
end-while
output p


In order to compute the running time for Algorithm 4.3.2, we first need measure
the size of the integer input n in terms of bits. We know that the integer n is of size

m = log2 (n) + 1

as measured in bits.
Proposition 4.3.3 The running time for PRIME is O(2m/2 ), where m =
log2 (n) + 1.
Proof The steps “d ← 2” and “p √ ← YES” count as 2 steps. Moreover, if n is prime,
then the while loop is repeated  n − 1 times. This is the maximum number of
times it will be repeated as a function of input n. Finally, the “output” step counts
as an additional step. Thus the maximal run time is
√ √
2 + ( n − 1) + 1 =  n + 2.

Now, since m ≥ log2 (n),


√  √
 n + 2 =  2log2 (n)  + 2 ≤ 2m + 2.
√ √
Since 2m + 2 ≤ 2 2m for m ≥ 2, the running time for PRIME is O(2m/2 ). 

We can show that the running time for PRIME is non-polynomial; the running
time O(2m/2 ) is the best possible for PRIME.
To this end, observe that

2(m−1)/2 ≤  n + 2,
60 4 Introduction to Complexity Theory


for m ≥ 2. Now, if  n + 2 = O(p(m)) for some polynomial p(m), then
2(m−1)/2 = O(p(m)). Thus, there exists M, N ∈ R+ for which

2(m−1)/2 ≤ Mp(m),

for all m ≥ N . Consequently,

2(m−1)/2
lim = M,
m→∞ p(m)

which is impossible since

2(m−1)/2
lim = ∞,
m→∞ p(m)

by L’Hôpital’s rule from calculus.


Thus the running time for PRIME grows exponentially with the size of the input
when the input is measured in bits; the running time is the order of the exponential
function 2m/2 . Algorithm PRIME is an “exponential time” algorithm.
Definition 4.3.4 An algorithm is an exponential time algorithm if its running time
t (m) satisfies t (m) = O(g(m)) where g(m) is an exponential function of m, and
where m is the size of the input as measured in bits.
An algorithm that has exponential running time cannot be considered practical
(or efficient) if the input is large.
In between polynomial and exponential time is “subexponential time”.
Definition 4.3.5 Let n ≥ 1 be an integer and let m = 1 + log2 (n) ≈ log2 (n),
so that polynomial time can be written in powers of O(m) and exponential time is
written in powers of O(n) = O(2m ). An algorithm runs in subexponential time if
its running time is
α (log (log (n)))1−α
O(2β(log2 (n)) 2 2 ),

where 0 < α < 1 and β > 0.


If α = 0, then
α (log (log (n)))1−α
O(2β(log2 (n)) 2 2 ) = O(2β log2 (log2 (n)) ) = O(log2 (n)β ) = O(mβ ),

which is polynomial time. If α = 1, then


α (log (log (n)))1−α
O(2β(log2 (n)) 2 2 ) = O(2β log2 (n) ) = O(2βm ),
4.4 Complexity Classes P, PP, BPP 61

and we have exponential time. If α is near 0, then the subexponential time is close to
polynomial time, if α is near 1, then the subexponential time is close to exponential
time. We will see some subexponential time algorithms in Sections 9.3 and 12.2.
We conclude that the problem of computing a function value is “easy” if there is a
polynomial time algorithm that computes the function value. Likewise, the problem
of computing a function value is “hard” if there is no polynomial time algorithm
that computes the function value.

4.4 Complexity Classes P, PP, BPP

Definition 4.4.1 A decision problem D is a problem whose solution is either “yes”


or “no”. An instance of the decision problem is an input for the problem that needs
to be decided. If S is the set of instances of decision problem D, then D can be
viewed as a function f : S → {YES, NO}; f is the function associated to D.
Example 4.4.2 Determining whether a given integer is prime or composite is a
decision problem,
D : given integer n ≥ 2, decide whether n is prime or composite
The set of instances for D is S = {2, 3, 4, . . . }, the associated function is

YES if n is a prime number
f (n) =
NO if n is a composite number

Algorithms are used to solve (or decide) decision problems; an algorithm that
solves a decision problem actually computes its associated function. For instance,
PRIME solves the decision problem D of Example 4.4.2; PRIME computes the
function f (n) of Example 4.4.2.
Decision problems that can be solved by efficient, practical algorithms form a
special subclass of problems.
Definition 4.4.3 A decision problem that can be solved in polynomial time (that
is, by using a polynomial time algorithm) is a polynomial time decidable (or
solvable) decision problem. The class of all decision problems that are decidable
in polynomial time is denoted as P.
The decision problem which determines whether an integer is prime or composite
is in P. Note: this cannot be established using Algorithm 4.3.2 since this algorithm
is not polynomial time. See [59, Theorems 3.17 and 3.18].
62 4 Introduction to Complexity Theory

4.4.1 Probabilistic Polynomial Time

Suppose that we want to write a practical (that is, polynomial time) algorithm to
solve a decision problem. Suppose that the best we can do is to devise an algorithm
that most of the time computes the correct answer but sometimes yields an incorrect
result. Is this kind of algorithm good enough?
For example, let D be a decision problem with associated function f , f (n) ∈
{YES, NO} for each instance n of the problem D.
Suppose that there is no polynomial time algorithm that will solve this decision
problem.
What we do have, however, is a polynomial time algorithm A, with input n and
output A(n) ∈ {YES, NO}, with the property that if f (n) = NO, then A will always
output NO, and if f (n) = YES, then A will likely output YES. That is, A satisfies
the conditional probabilities

1
Pr(A(n) = YES|f (n) = YES) > , (4.1)
2

Pr(A(n) = YES|f (n) = NO) = 0. (4.2)

Thus, more than half of the time, the polynomial time algorithm A computes the
right answer, but there is a chance that it will output the wrong answer. As we will
show in Proposition 4.4.5, one can devise a new algorithm that satisfies

Pr(output = YES|f (n) = YES) > 1 − ,

for  > 0. And so, we can consider A an acceptable, practical algorithm for
computation.
Our hypothetical algorithm A is a “probabilistic” polynomial time algorithm.
A probabilistic algorithm is allowed to employ a random process in one or more
of its steps.
Decision problems which can be solved with probabilistic algorithms that run in
polynomial time define a broader class of decision problems than P.
Definition 4.4.4 Let D be a decision problem with instance n and associated
function f , f (n) ∈ {YES, NO}. Then D is decidable (or solvable) in probabilistic
polynomial time if there exists a probabilistic polynomial time algorithm A so
that
(i) Pr(A(n) = YES|f (n) = YES) > 12 ,
(ii) Pr(A(n) = YES|f (n) = NO) = 0.
The class of all decision problems that are decidable in probabilistic polynomial
time is denoted as PP.
4.4 Complexity Classes P, PP, BPP 63

Let p(x) be a positive polynomial, that is, p(x) is a polynomial with integer
coefficients, for which p(m) ≥ 1, whenever m ≥ 1.
Proposition 4.4.5 Let D be a decision problem in PP with function f . Suppose n is
an instance of D of size m in bits and let p(x) be a positive polynomial. Then there
exists a probabilistic polynomial time algorithm A so that
(i) Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) ,
(ii) Pr(A (n) = YES|f (n) = NO) = 0.
Proof Since D ∈ PP, there exists a probabilistic polynomial time algorithm A so
that
1
Pr(A(n) = YES|f (n) = YES) >
2
and

Pr(A(n) = YES|f (n) = NO) = 0.

Note that

Pr(A(n) = NO|f (n) = YES) = 1 − Pr(A(n) = YES|f (n) = YES)


1
< .
2

We devise a new algorithm A as follows: Repeat A p(m) times; if A(n) = YES


on any repetition, then set A (n) = YES, else A (n) = NO. Thus,

Pr(A (n) = NO|f (n) = YES) = (Pr(A(n) = NO|f (n) = YES)p(m)


< 2−p(m) .

Thus Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) . Also, if f (n) = NO, then
Pr(A(n) = YES) = 0, thus, Pr(A (n) = YES|f (n) = NO) = 0. 

There is a broader class of decision problems.
Definition 4.4.6 Let D be a decision problem with instance n and associated func-
tion f , f (n) ∈ {YES, NO}. Then D is bounded-error probabilistic polynomial
time decidable if there exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) > 12 ,
(ii) Pr(A(n) = YES|f (n) = NO) < 12 .
The class of all decision problem that are decidable in bounded-error probabilis-
tic polynomial time is denoted as BPP.
64 4 Introduction to Complexity Theory

Proposition 4.4.7 Let D be a decision problem in BPP with function f . Suppose


n is an instance of D of size m in bits and let p(x) be a positive polynomial. Then
there exists a probabilistic polynomial time algorithm A so that
(i) Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) ,
(ii) Pr(A (n) = YES|f (n) = NO) < 2−p(m) .
Proof Since D ∈ BPP, there exists a probabilistic polynomial time algorithm A so
that
1
Pr(A(n) = YES|f (n) = YES) = +
2
and
1
Pr(A(n) = YES|f (n) = NO) = − δ,
2

for 0 < , δ ≤ 12 . 

Without loss of generality, we assume that  ≤ δ. Since 4 2 > 0, 0 ≤ 1−4 2 < 1
and we have

lim (x + 1)(1 − 4 2 )x = 0.
x→∞

Thus there exists an integer c so that

1
(c + 1)(1 − 4 2 )c < ,
2

and so, (c + 1)p(m) (1 − 4 2 )cp(m) < 2−p(m) . But

(cp(m) + 1)(1 − 4 2 )cp(m) ≤ (c + 1)p(m) (1 − 4 2 )cp(m) ,

and so,

(cp(m) + 1)(1 − 4 2 )cp(m) < 2−p(m) .

Now  ≤ δ implies 1 − 4 2 ≥ 1 − 4δ 2 , thus

(cp(m) + 1)(1 − 4δ 2 )cp(m) ≤ (cp(m) + 1)(1 − 4 2 )cp(m) < 2−p(m) . (4.3)

We now prove (i). We devise a new algorithm A as follows: Repeat A 2cp(m)+1


times; if A(n) = YES on a majority of the repetitions (at least cp(m)+1 times), then
A (n) = YES, otherwise, A (n) = NO. From the binomial distribution function
4.4 Complexity Classes P, PP, BPP 65

with success probability 1


2 +  (see Example 2.4.2),

Pr(A (n) = NO|f (n) = YES)

 
cp(m)
2cp(m) + 1
   
1 + 2 i 1 − 2 2cp(m)+1−i
=
i 2 2
i=0
  
2cp(m) + 1 1 + 2 cp(m)
≤ (cp(m) + 1)
cp(m) 2
 cp(m)  
1 − 2 1 − 2
·
2 2
  cp(m)
(cp(m) + 1)(1 − 2) 2cp(m) + 1 1 − 4 2
=
2 cp(m) 4
 cp(m)
(cp(m) + 1) 2cp(m)+1 1 − 4 2
≤ (2 )
2 4
= (cp(m) + 1)(1 − 4 2 )cp(m)
< 2−p(m) by (4.3).

Thus,

Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) .

Since A runs in polynomial time and A repeats A a polynomial number of times,


A is polynomial time. Thus (i) follows.
For (ii): Pr(A (n) = YES|f (n) = NO)


2cp(m)+1     
2cp(m) + 1 1 − 2δ i 1 + 2δ 2cp(m)+1−i
=
i 2 2
i=cp(m)+1
    
2cp(m) + 1 1 − 2δ cp(m)+1 1 + 2δ cp(m)
≤ (cp(m) + 1)
cp(m) + 1 2 2
   cp(m)
(cp(m) + 1)(1 − 2δ) 2cp(m) + 1 1 − 4δ 2
=
2 cp(m) + 1 4
 cp(m)
(cp(m) + 1) 2cp(m)+1 1 − 4δ 2
≤ (2 )
2 4
= (cp(m) + 1)(1 − 4δ 2 )cp(m)
< 2−p(m) by (4.3).
66 4 Introduction to Complexity Theory

Proposition 4.4.8 P ⊆ PP ⊆ BPP.


Proof This follows from the definitions of P, PP, and BPP. 

Remark 4.4.9 If A is any computer algorithm, then a Turing machine capable of
simulating the logical structure of A can be constructed. This is a special case of
the Church–Turing thesis , which states that anything that can be computed can
be computed by some Turing machine. (A Turing machine is a special type of finite
automaton, see [59, Section 2.2], [27, Chapter 2, Chapter 8].)
Thus in the definition of complexity classes P, PP, BPP given above, one
can replace “probabilistic polynomial time algorithm” with “probabilistic Turing
machine”, see [59, Chapter 4]. 

4.4.2 An Example

We give an example of a decision problem in BPP. Suppose Alice and Bob are using
the right shift cryptosystem M, C, e, d, K with symmetric key k (Section 1.1).
Here M consists of all 3-grams over the alphabet {A . . . Z} that are recognizable
3-letter words in plaintext English and C consists of all encryptions of words in M
using the correct key k, i.e., C = e(M, k).
For M ∈ M, let C = e(M, k) denote the encryption of M. Then

Pr(d(C, k) ∈ M) = 0.

For l, 0 ≤ l ≤ 25, with l = k,

1
Pr(d(C, l) ∈ M) > .
2
Let D be the decision problem:
Given an integer l, 0 ≤ l ≤ 25, decide whether l is the key.
The associated function for D is f : {0, 1, 2, . . . , 25} → {YES, NO} with

YES if l = k
f (l) =
NO if l = k.

Proposition 4.4.10 D ∈ BPP.


Proof We have to show that there is a probabilistic polynomial time algorithm A so
that, for instance, l of D,

1
Pr(A(l) = YES|f (l) = YES) > ,
2
4.4 Complexity Classes P, PP, BPP 67

and
1
Pr(A(l) = YES|f (l) = NO) < .
2
To this end, we consider the algorithm
Algorithm (IS_KEY)
Input: an integer l ∈ {0, 1, 2, . . . , 25}
Algorithm:
choose C at random from C
M ← d(C, l)
if M ∈ M then output YES
else output NO


If l = k, i.e., if f (l) = YES, then Pr(A(l) = YES) = 1. Thus

1
Pr(A(l) = YES|f (l) = YES) = 1 > .
2
On the other hand, if l = k, then

1
Pr(d(C, l) ∈ M) = 1 − Pr(d(C, l) ∈ M) < .
2

Thus, if f (l) = NO, then Pr(A(l) = YES) < 12 , and so,

1
Pr(A(l) = YES|f (l) = NO) < .
2
Computation of d(C, l) uses the basic operations +, −, and so the algorithm runs in
polynomial time. Thus D ∈ BPP. 

Here is an instance where IS_KEY returns the wrong answer. Suppose that the
correct key is k = 4. Then

MFQ = e(IBM, 4),

and so MFQ ∈ C. Suppose that the input to IS_KEY is l = 12. Then f (12) = NO.
We now run the algorithm IS_KEY on input l = 12. Suppose that the randomly
68 4 Introduction to Complexity Theory

chosen element of C is MFQ. Then

ATE = d(MFQ, 12),

with ATE ∈ M. Consequently, A(12) = YES, which is incorrect.


Of course, by Proposition 4.4.5, IS_KEY can be modified to give an algorithm
which is nearly always correct: Let l be an integer, 0 ≤ l ≤ 25, of input size
m = log2 (l) + 1, and let p(m) be a positive polynomial. By Proposition 4.4.5
there exists a probabilistic polynomial time algorithm A with

Pr(A (l) = YES|f (l) = YES) = 1 > 1 − 2−p(m) ,

Pr(A (n) = YES|f (n) = NO) < 2−p(m) .

4.5 Probabilistic Algorithms for Functions

Suppose that D is a decision problem in BPP with associated function f . Then there
exists a polynomial time algorithm A so that

1
Pr(A(n) = YES|f (n) = YES) = + ,
2
1
Pr(A(n) = YES|f (n) = NO) = − δ,
2
for some , δ > 0. Thus

1
Pr(A(n) = NO|f (n) = NO) = + δ.
2
Proposition 4.5.1 Let D ∈ BPP and let n be an instance of D. Then

1
Pr(A(n) = f (n)) > .
2
Proof Suppose that r = Pr(f (n) = YES), so that Pr(f (n) = NO) = 1 − r. Then

Pr(A(n) = f (n)) = r Pr(A(n) = YES|f (n) = YES)


+ (1 − r) Pr(A(n) = NO|f (n) = NO)
 
1 1
= r( + ) + (1 − r) +δ
2 2
r 1 r
= + r + + δ − − rδ
2 2 2
4.6 Exercises 69

1
= + r + (1 − r)δ
2
1
> .
2


From Proposition 4.5.1 we see that A is a polynomial time algorithm that
computes f in the sense that it is likely to compute f for an arbitrary instance.
We say that A is a probabilistic polynomial time algorithm that computes the
function f .
In Chapter 9 we will encounter probabilistic polynomial time algorithms that
attempt to compute other functions.

4.6 Exercises

1. How may bits are required to represent the decimal integer n = 237 in binary?
2. Compute the size of the decimal integer n = 39 in bits. Compute (39)2 .
3. Let A be an algorithm with running time O(n3 ) where n is an integer. Compute
the running time as a function of bits. Is A a polynomial time algorithm?
4. Use the algorithm a_DIV_n to compute 45/4.
5. Assume that n is an even integer, n ≥ 0. Write an algorithm that computes the
function f (n) = n/2 and determine whether the algorithm is efficient.
6. Assume that n is an integer, n ≥ 0. Write an algorithm that computes the
function f (n) = 2n and determine the running time of the algorithm.
7. In this exercise we consider another algorithm that adds integers. The unary
representation of an integer a ≥ 0, is a string of 1’s of length a. For example,
8 in unary is

11111111.

Suppose a is written in unary. Let a+ denote the unary representation formed


by appending a 1 to a and let a− denote the unary representation formed by
deleting a 1 from the end of a.
Algorithm (UN_INT_ADD)
Input: integers a ≥ b ≥ 0 encoded in unary
Output: a + b encoded in unary
Algorithm:
while b = 0 do
a ← a+
b ← b−
70 4 Introduction to Complexity Theory

end-while
output a


Compute the running time of UN_INT_ADD.


8. Let n ≥ 1 be an integer and let S be a set of n positive integers. The well-known
algorithm MERGE_SORT sorts the elements of S from smallest to largest [65].
It is known that the running time of MERGE_SORT is O(n log2 (n)), where
the input n is the size of the set S to be sorted. Show that MERGE_SORT is
non-polynomial time.
9. The bit-wise truth table for exclusive AND (XAND) is given as

x y XAND(x, y)
0 0 1
0 1 0
1 0 0
1 1 1

Let [0, 1]m denote the set of all binary strings of length m and let

f : [0, 1]m × [0, 1]m → [0, 1]m

be the function defined as

f (a, b) = c,

where a = am am−1 · · · a2 a1 , b = bm bm−1 · · · b2 b1 are binary strings of length


m and and c = cm cm−1 · · · c2 c1 is the binary string where ci = XAND(ai , bi ),
for 1 ≤ i ≤ m.
The following algorithm computes f :
Algorithm
Input: two strings in binary a = am am−1 · · · a2 a1 ,
b = bm bm−1 · · · b2 b1
Algorithm:
for i = 1 to m
if ai = bi
then ci ← 1
else ci ← 0
next i
output cm cm−1 · · · c2 c1
4.6 Exercises 71

(a) Compute the output of the algorithm if the input is a = 1100, b = 0100.
(b) Determine the running time of the algorithm.
10. Let f : R+ → R+ , g : R+ → R+ be functions with f (x) = O(x), and
g(x) = O(x 2 ). Let h(x) = f (x)g(x). Show that h(x) = O(x 3 ).
11. Let A be an algorithm that runs in time O(m2 ). Show that A runs in time O(mr )
for r ≥ 2. √ √
12. Let A be an algorithm that runs in subexponential time O(2 m log2 (m) ), where
the input has length m in bits. Show that A is not polynomial time.
13. Let D be the decision problem:
Given two integers a ≥ 0, n > 0, determine whether n divides a evenly
(denoted as n | a).
Show that D ∈ P.
14. Let D be the decision problem:
Let p(x) be a polynomial of degree d integer coefficients. Determine
whether p(x) is constant, i.e., p(x) = c for some integer c.
Show that D ∈ PP.
15. Let D be a decision problem with function f . Let n be an instance of D and
suppose there exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) ≥ c,
(ii) Pr(A(n) = YES|f (n) = NO) = 0,
where 0 < c ≤ 1. Show that D ∈ PP.
16. Let D be a decision problem with function f and let n be an instance of D of
size m. Let p(x) be a positive polynomial for which p(m) ≥ 2. Suppose there
exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) ≥ p(m)
1
,
(ii) Pr(A(n) = YES|f (n) = NO) = 0.
Show that D ∈ PP.
Chapter 5
Algebraic Foundations: Groups

Algebraic concepts such as groups, rings, and fields are essential for the study of
symmetric key and public key cryptography.

5.1 Introduction to Groups

Let S be a non-empty set of elements. A binary operation on S is a function

B : S × S → S.

We denote the image B(a, b) by ab. A binary operation is commutative if for all
a, b ∈ S,

ab = ba;

it is associative if for all a, b, c ∈ S,

a(bc) = (ab)c.

For example, let

Z = {· · · − 3, −2, −1, 0, 1, 2, 3, . . . }

denote the set of integers. Then ordinary addition,

+ : Z × Z → Z, (a, b) → a + b,

© Springer Nature Switzerland AG 2022 73


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_5
74 5 Algebraic Foundations: Groups

is a binary operation on Z. Since a + b = b + a, and a + (b + c) = (a + b) +


c, ∀a, b, c ∈ Z, ordinary addition is a commutative and associative binary operation
on Z.
A semigroup is a non-empty set S together with an associative binary operation
S × S → S. For example, the set Z together with ordinary addition (sometimes
denoted as Z, +) is a semigroup.
A monoid is a semigroup S for which there exists an element e ∈ S with

ea = a = ae,

∀a ∈ S. The element e is an identity element for the monoid. If we take e = 0,


then the semigroup Z, + is also a monoid, since ∀a ∈ Z, a + 0 = a = 0 + a.
Here is an important example of a monoid. An alphabet is a finite set =
{s1 , s2 , . . . , sk } whose elements are the letters of the alphabet. A word over is a
finite sequence of letters in . A word over can be written as

x = a1 a2 a3 . . . al ,

where ai ∈ for 1 ≤ i ≤ l; the length of word x is the number of letter in x.


Let be an alphabet. The closure of , denoted by ∗ , is the collection of all
words over . The closure ∗ contains a unique word of length 0, called the empty
word, which is denoted as ε.
On ∗ there is a binary operation called concatenation, denoted by ·, which is
defined as follows. Let x = a1 a2 . . . al , y = b1 b2 . . . bm be two words in ∗ . Then

x · y = xy = a1 a2 . . . al b1 b2 . . . bm .

One easily shows that · is an associative binary operation: For words x, y, z ∈ ∗,

we have

x · (y · z) = x · yz
= xyz
= xy · z
= (x · y) · z,

thus  ∗ , · is a semigroup. In fact, if we take e to be the empty word ε, then  ∗ , ·

is a monoid, which is the monoid of words over .


We have seen this monoid before. For our alphabet, take

= {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z}.

Then


= L n = L0 ∪ L 1 ∪ L 2 ∪ · · · ,
n≥0
5.2 Examples of Infinite Groups 75

where Ln is the collection of all n-grams over ; ∗ , ·, ε is a monoid. In this


monoid, for instance,

SCARLET · FIRE = SCARLETFIRE.

The set of plaintext English words is a small subset of ∗ ; given a cryptosystem, the
set of all possible ciphertext encryptions of these plaintext English words is some
other subset of ∗ .
A group is a monoid with an additional property.
Definition 5.1.1 A group is a non-empty set G together with a binary operation
G × G → G for which
(i) the binary operation is associative;
(ii) there exists an element e ∈ G for which ea = a = ae, for all a ∈ G;
(iii) for each a ∈ G, there exists an element c ∈ G for which ca = e = ac.
An element e satisfying (ii) is an identity element for G; an element c satisfying
(iii) is an inverse element of a and is denoted by a −1 .
The order |G| of a group G is the number of elements in G. If G is a finite set,
i.e., |G| = n < ∞, then G is a finite group. If G is not finite, then G is an infinite
group.
A group G in which the binary operation is commutative is an abelian group.
The monoid ∗ of words over the alphabet is not abelian and is not a group.

5.2 Examples of Infinite Groups

Example 5.2.1 Z, + is an infinite group. To prove this we check that conditions
(i), (ii), (iii) of Definition 5.1.1 hold. For a, b, c ∈ Z,

a + (b + c) = (a + b) + c,

thus + is an associative binary operation, so (i) holds. Setting e = 0, we see that

a + 0 = a = 0 + a,

so 0 is an identity element, so (ii) holds, and finally, for each a ∈ Z, let c = −a,
then

a + (−a) = 0 = (−a) + a,

so −a is an inverse for a, and so (iii) holds.


Clearly, Z is an infinite set.

76 5 Algebraic Foundations: Groups

Since

a + b = b + a,

for all a, b ∈ Z, Z, + is an infinite abelian group with an “additive” binary


operation.
Let Q denote the set of rational numbers, and let R denote the set of real numbers.
Then Q, + and R, + are infinite abelian additive groups.
Example 5.2.2 Let Q+ denote the set of positive rational numbers, that is,

Q+ = {a ∈ Q | a > 0}.

Let · denote ordinary multiplication. Then Q+ , · is an infinite group. To prove this,
we show that conditions (i), (ii), and (iii) of Definition 5.1.1 hold: For a, b, c ∈ Q+ ,

a · (b · c) = (a · b) · c,

thus · is an associative binary operation, so (i) holds. Setting e = 1, we see that

a · 1 = a = 1 · a,

so 1 is an identity element, so (ii) holds, and for each a ∈ Q+ , let c = a1 , then

1 1
a· = 1 = · a,
a a

so a1 is an inverse for a, and so (iii) holds. Finally, Q is not finite, thus Q+ , · is an
infinite group.
Since

a · b = b · a, ∀a, b ∈ Q+ ,

Q+ , · is an infinite abelian group with a “multiplicative” binary operation.



Example 5.2.3 Let R× denote the set of non-zero real numbers, i.e.,

R× = {x ∈ R | x = 0}.

Then one can easily show that R× , · is an infinite abelian group.

Example 5.2.4 Let GL2 (R) denote the collection of all invertible 2 × 2 matrices
with entries in R. Let · denote ordinary matrix multiplication. Then GL2 (R), · is
an infinite multiplicative group which is not abelian.
5.3 Examples of Finite Groups 77

Proof Recall from


 linear algebra that matrix multiplication is associative, moreover
10
e = I2 = serves as an identity element. Since A ∈ GL2 (R) is invertible,
01
there exists A−1 ∈ GL2 (R) with

AA−1 = I2 = A−1 A,

thus GL2 (R), · is an infinite multiplicative group. Since


     
11 01 01 11
 = ,
01 10 10 01

GL2 (R) is not abelian. 



More generally, the collection of l × l invertible matrices over R, denoted as
GLl (R) is a group under matrix multiplication. In the case l = 1, we identify
GL1 (R) with R× .
When studying group theory, non-examples are just as important as examples!
For instance, let be the alphabet {0, 1}. Then ∗ = {0, 1}∗ is the collection of all
finite sequences of 0’s and 1’s; {0, 1}∗ is infinite.
If we take e to be the empty string, then {0, 1}∗ is a monoid under the binary
operation concatenation. However, {0, 1}∗ fails to be a group since no element other
than e has an inverse (Definition 5.1.1(iii) fails).

5.3 Examples of Finite Groups

Recall that a group G is finite if |G| = n < ∞. We give two examples of finite
groups that are important in cryptography.

5.3.1 The Symmetric Group on n Letters

Let = {0, 1, 2, 3, . . . , n − 1} be an alphabet of n letters. A permutation of is


a function σ : → that is both 1-1 and onto (equivalently: σ is a one-to-one
correspondence, or σ is a bijection).
Let σ : → be a permutation of . For i ∈ , we let σ (i) ∈ denote the
image of i under σ . We can write σ in convenient permutation notation:
 
0 1 2 3 ··· n − 2 n−1
σ = ,
σ (0) σ (1) σ (2) σ (3) · · · σ (n − 2) σ (n − 1)
78 5 Algebraic Foundations: Groups

in which the domain of σ , = {0, 1, 2, 3, . . . , n − 1}, is written along the top


row, and under each domain value i, we write its image σ (i), forming the bottom
row. For instance, if = {0, 1, 2}, and σ : → is the permutation defined as
σ (0) = 1, σ (1) = 2, σ (2) = 0, then
 
012
σ = .
120

There are

n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1

possible permutations of the set of n letters.


Let Sn denote the collection of all permutations of , |Sn | = n! We define a
binary operation on Sn

◦ : Sn × Sn → Sn

by the rule: for σ, τ ∈ Sn , i ∈ ,

(σ ◦ τ )(i) = σ (τ (i)).

Note that σ ◦ τ ∈ Sn ; ◦ is ordinary function composition.


Proposition 5.3.1 Sn , ◦ is a group.
Proof We show that conditions (i), (ii), (iii) of Definition 5.1.1 hold. Let σ, τ, ρ ∈
Sn . Then σ ◦ (τ ◦ ρ) = (σ ◦ τ ) ◦ ρ, since function composition is always associative.
Next, let e be the identity permutation:
 
0 1 2 3 ··· n − 2 n − 1
e= .
0 1 2 3 ··· n − 2 n − 1

Then e ◦ σ = σ = σ ◦ e, so (ii) holds. Finally, let σ ∈ Sn . Then σ : →


is a bijection, thus there exists an inverse function σ −1 : → , which is also a
permutation. Moreover, σ −1 ◦ σ = e = σ ◦ σ −1 , thus (iii) holds.
It follows that Sn , ◦ is a group of order |Sn | = n! 

The group Sn , ◦ is the symmetric group on n letters.
Example 5.3.2 Let n = 3, so that = {0, 1, 2}. Then S3 , ◦ is the symmetric
group on 3 letters; S3 contains 3! = 6 permutations of the 3 letter set {0, 1, 2}.
These 6 permutations can be listed as:
     
012 012 012
σ1 = , σ2 = , σ3 = ,
012 201 120
5.3 Examples of Finite Groups 79

     
012 012 012
σ4 = , σ5 = , σ6 = .
021 210 102


The permutation σ1 is an identity element in S3 . In S3 ,

σ2 ◦ σ5 = σ6

since

(σ2 ◦ σ5 )(0) = σ2 (σ5 (0)) = σ2 (2) = 1 = σ6 (0),


(σ2 ◦ σ5 )(1) = σ2 (σ5 (1)) = σ2 (1) = 0 = σ6 (1),
(σ2 ◦ σ5 )(2) = σ2 (σ5 (2)) = σ2 (0) = 2 = σ6 (2).

The group product σ2 ◦ σ5 = σ6 can also be computed using “right-to-left”


permutation multiplication:

    
012 012 012
= ,
201 210 102

in which reading right-to-left, 2 → 0 → 2, 1 → 1 → 0, and 0 → 2 → 1.


The complete table of binary operations in S3 is:

◦ σ1 σ2 σ3 σ4 σ5 σ6
σ1 σ1 σ2 σ3 σ4 σ5 σ6
σ2 σ2 σ3 σ1 σ5 σ6 σ4
σ3 σ3 σ1 σ2 σ6 σ4 σ5
σ4 σ4 σ6 σ5 σ1 σ3 σ2
σ5 σ5 σ4 σ6 σ2 σ1 σ3
σ6 σ6 σ5 σ4 σ3 σ2 σ1

From the table, one finds that σ3−1 = σ2 since σ2 ◦ σ3 = σ1 = σ3 ◦ σ2 . Moreover,


S3 is non-abelian since σ4 ◦ σ3 = σ3 ◦ σ4 .
Here is another example of “right-to-left” permutation multiplication. Let S5 be
the symmetric group on 5 letters and let
   
01234 01234
σ = , τ= .
10342 32104
80 5 Algebraic Foundations: Groups

be elements of S5 . Then the right-to-left permutation multiplication yields



    
01234 01234 01234
σ ◦τ = =
10342 32104 43012

since (reading right-to-left) 4 → 4 → 2, 3 → 0 → 1, and so on.

Cycle Decomposition

Let = {0, 1, 2, . . . , n − 1} be the set of n letters, let {a1 , a2 , . . . , al } ⊆ be a


subset of letters. A cycle is a permutation of the form (a1 , a2 , a3 , . . . , al−1 , al ) that
denotes the permutation

a1 → a2 → a3 → · · · → al−1 → al → a1 ,

which fixes all of the other letters in . The length of the cycle (a1 , a2 , . . . , al ) is l.
Every permutation in Sn can be written as a product of cycles yielding the cycle
decomposition of the permutation. For example, let σ ∈ S5 be the permutation
given above. We have
σ σ
0 → 1 → 0,

and
σ σ σ
2 → 3 → 4 → 2.

Thus σ factors as

σ = (0, 1)(2, 3, 4),

where the cycles (0, 1) and (2, 3, 4) can be written in standard notation as
   
01234 01234
(0, 1) = , and (2, 3, 4) = .
10234 01342
 
01234
Moreover, for τ = ∈ S5 , we obtain
32104

τ τ
0 → 3 → 0,
τ τ
1 → 2 → 1,
5.3 Examples of Finite Groups 81

and
τ
4 → 4.

Thus

τ = (0, 3)(1, 2)(4),

where
   
01234 01234
(0, 3) = , (1, 2) = ,
31204 02134

and
 
01234
(4) = .
01234

A cycle of length 2 is a transposition. Every permutation in Sn , n ≥ 2, can be


written as a product of transpositions by first obtaining its cycle decomposition and
then using the cycle decomposition formula

(a1 , a2 , . . . , al ) = (a1 , al )(a1 , al−1 ) · · · (a1 , a2 ).

For instance, for σ , τ in S5 as above,

σ = (0, 1)(2, 4)(2, 3) and τ = (0, 3)(1, 2).

Note that cycles of length one may be omitted from the cycle decomposition.
If σ ∈ Sn , n ≥ 2, can be written as a product of an even number of transpositions,
then it is an even permutation, if σ factors into an odd number of transpositions,
then σ is an odd permutation. In our example, σ ∈ S5 is odd, while τ is even.
A permutation in Sn , n ≥ 2, cannot be both even and odd. See Section 5.8,
Exercise 7.

5.3.2 The Group of Residues Modulo n

Let n, a be integers with n > 0. A residue of a modulo n is any integer r for which
a = nq + r for some q ∈ Z. For instance, if n = 3, a = 8, then 11 is a residue of 8
modulo 3 since 8 = 3(−1) + 11, but so is 2 since 8 = 3(2) + 2.
The least non-negative residue of a modulo n is the smallest non-negative
number r for which a = nq + r. The possible least non-negative residues of a
modulo n are 0, 1, 2, . . . , n − 1. The least non-negative residue of a modulo n is
82 5 Algebraic Foundations: Groups

denoted as (a mod n). For example, (8 mod 3) = 2, moreover, (−3 mod 4) = 1 and
(11 mod 4) = (3 mod 4) = 3.
For n, a ∈ Z, n > 0, a ≥ 0, the value of (a mod n) coincides with the value of
r obtained when we divide a by n using the long division algorithm, yielding the
division statement

a = nq + r, r = (a mod n).

In fact, as we saw in Algorithm 4.2.8, q = a/n, thus

a = na/n + (a mod n),

0 ≤ (a mod n) < n.
We say that two integers a, b are congruent modulo n if (a mod n) = (b mod n),
and we write a ≡ b (mod n). Let a, n be integers with n > 0. Then n divides a,
denoted by n | a, if there exists an integer k for which a = nk.
Proposition 5.3.3 Let a, b, n ∈ Z, n > 0. Then a ≡ b (mod n) if and only if
n | (a − b).
Proof To prove the “only if” part, assume that a ≡ b (mod n). Then (a mod n) =
(b mod n), so there exist integers l, m for which a = nm + r and b = nl + r with
r = (a mod n) = (b mod n). Thus a − b = n(m − l). For the “if” part, assume
that a − b = nk for some k. Then (nm + (a mod n)) − (nl + (b mod n)) = nk for
some m, l ∈ Z, so that n divides (a mod n) − (b mod n). Consequently, (a mod n) −
(b mod n) = 0, hence a ≡ b (mod n). 

Proposition 5.3.3 can help us compute (a mod n). For instance,

(−14 mod 17) = (3 mod 17) = 3

since 17 | (−14 − 3). Likewise (−226 mod 17) = (12 mod 17) = 12 since 17 |
(−226 − 12).
The standard way to compute (a mod n) for a ≥ 0 is by using the formula

(a mod n) = a − a/nn.

For example, 226/17 = 13, thus (226 mod 17) = 226−13·17 = 5. The following
algorithm computes (a mod n) using Algorithm 4.2.8.
Algorithm 5.3.4 (a_MOD_n)
Input: integers a ≥ 0, n > 0, with (a)2 = am . . . a2 a1 ,
i.e., am . . . a2 a1 is the binary representation of a
Output: (a mod n)
Algorithm:
d ← a_DIV_n
5.3 Examples of Finite Groups 83

r ← a − dn
output r

Proposition 5.3.5 a_MOD_n runs in time O(m2 ).
Proof Each step of the algorithm runs in time O(m2 ). 

For n > 0 consider the set J = {0, 1, 2, 3, . . . , n − 1} of least non-negative
residues modulo n. Note that a = (a mod n), ∀a ∈ J . On J we define a binary
operation +n as follows: for a, b ∈ J ,

(a mod n) +n (b mod n) = ((a + b) mod n).

Proposition 5.3.6 J, +n  is a finite abelian group of order n.


Proof Exercise. 

The group J, +n  is the group of residues modulo n. We denote this group by Zn ;
|Zn | = n. In the case n = 5, Z5 = {0, 1, 2, 3, 4} is the group of residues modulo 5.
In Z5 , for instance, 2 +5 3 = ((2 + 3) mod 5) = (5 mod 5) = 0. The complete group
table (binary operation table) for Z5 is:

+5 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3

For another example, consider the group Z8 = {0, 1, 2, 3, 4, 5, 6, 7}. In this group,
4 +8 5 = ((4 + 5) mod 8) = (9 mod 8) = 1. The group table appears as

+8 0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 2 3 4 5 6 7 0
2 2 3 4 5 6 7 0 1
3 3 4 5 6 7 0 1 2
4 4 5 6 7 0 1 2 3
5 5 6 7 0 1 2 3 4
6 6 7 0 1 2 3 4 5
7 7 0 1 2 3 4 5 6
84 5 Algebraic Foundations: Groups

5.4 Direct Product of Groups

We can construct a new group from a finite set of groups. 


Let S1 , S2 , . . . , Sk be a finite collection of sets. The cartesian product ki=1 Si
is the collection of all k-tuples {(a1 , a2 , . . . , ak ) : ai ∈ Si }. For example, if S1 =
S2 = R, then the cartesian product 2i=1 R = R × R is the familiar xy-coordinate
system that you have seen in calculus class.
Proposition 5.4.1 Let Gi , i = 1, . . . , k, be a finite collection of groups. Then the
cartesian product ki=1 Gi is a group under the binary operation defined as

(a1 , a2 , . . . , ak ) · (b1 , b2 , . . . , bk ) = (a1 b1 , a2 b2 , . . . , ak bk ),

where ai bi is the group product in Gi for 1 ≤ i ≤ k.


Proof We show that the conditions of Definition 5.1.1 hold. Forassociativity, let
(a1 , a2 , . . . , ak ), (b1 , b2 , . . . , bk ), (c1 , c2 , . . . , ck ) be elements of ki=1 Gi . Then

(a1 , a2 , . . . , ak ) · ((b1 , b2 , . . . , bk ) · (c1 , c2 , . . . , ck ))


= (a1 , a2 , . . . , ak ) · (b1 c1 , b2 c2 , . . . , bk ck )
= (a1 (b1 c1 ), a2 (b2 c2 ), . . . , ak (bk ck ))
= ((a1 b1 )c1 ), (a2 b2 )c2 ), . . . , (ak bk )ck ))
= (a1 b1 , a2 b2 , . . . , ak bk ) · (c1 , c2 , . . . , ck )
= ((a1 , a2 , . . . , ak ) · (b1 , b2 , . . . , bk )) · (c1 , c2 , . . . , ck )

and so · is associative. For an identity element we take e = (e1 , e2 , . . . , ek ) where


ei is an identity in Gi . Then

(e1 , e2 , . . . , ek ) · (a1 , a2 , . . . , ak ) = (e1 a1 , e2 a2 , . . . , ek ak )


= (a1 , a2 , . . . , ak )
= (a1 e1 , a2 e2 , . . . , ak ek )
= (a1 , a2 , . . . , ak ) · (e1 , e2 , . . . , ek )

as required.
Lastly, for each k-tuple (a1 , a2 , . . . , ak ) one has

(a1 , a2 , . . . , ak )−1 = (a1−1 , a2−1 , . . . , ak−1 ).

We leave it to the reader to verify that (a1−1 , a2−1 , . . . , ak−1 ) is a left and right inverse
for the element (a1 , a2 , . . . , ak ). 

5.5 Subgroups 85


The group ki=1 Gi of Proposition 5.4.1 is the direct product group. As an
illustration, we take G1 = Z4 and G2 = Z5 and form the direct product group
Z4 × Z5 . In Z4 × Z5 , for instance,

(3, 2) · (1, 4) = (3 +4 1, 2 +5 4) = (0, 1).

Note that |Z4 × Z5 | = 20.


As another example, we take G1 = G2 = G3 = R, where R is the additive group
of real numbers. Then 31 R = R3 is the underlying group of Euclidean 3-space
consisting of 3-dimensional vectors of the form v = (a1 , a2 , a3 ) where a1 , a2 , a3 ∈
R. The group operation is the familiar vector addition: for v = (a1 , a2 , a3 ), w =
(b1 , b2 , b3 ) ∈ R3 ,

v + w = (a1 , a2 , a3 ) + (b1 , b2 , b3 ) = (a1 + b1 , a2 + b2 , a3 + b3 ).

5.5 Subgroups

In this section we consider subgroups, which are the analogs for groups of subsets of
a set. We determine the collection of left and right cosets of a subgroup in a group,
give the partition theorem and Lagrange’s Theorem.
Let H be a subset of a group G. Then the binary operation B on G restricts to a
function B|H : H × H → G. If B|H (H × H ) ⊆ H , then H is closed under the
binary operation B. In other words, H is closed under B if B(a, b) = ab ∈ H for
all a, b ∈ H . If H is closed under B, then B|H is a binary operation on H . Closure
is fundamental to the next definition.
Definition 5.5.1 Let H be a subset of a group G that satisfies the following
conditions.
(i) H is closed under the binary operation of G,
(ii) e ∈ H ,
(iii) for all a ∈ H , a −1 ∈ H .
Then H is a subgroup of G, which we denote by H ≤ G.
For example, 2Z = {2n | n ∈ Z} ≤ Z. The subset {0, 3, 6, 9} is a subgroup of
Z12 . The subset {σ1 , σ4 } is a subgroup of S3 . The set of integers Z is a subgroup of
the additive group R.
Every group G admits at least two subgroups: the trivial subgroup {e} ≤ G, and
the group G which is a subgroup of itself. If H ≤ G and H is a proper subset of G,
then H is a proper subgroup of G and we write H < G. If H is a subgroup of G,
then H is a group under the restricted binary operation of G.
86 5 Algebraic Foundations: Groups

Definition 5.5.2 Let H be a subgroup of G with a ∈ G. The set of group products


aH = {ah | h ∈ H } is the left coset of H in G represented by a. The collection
H a = {ha | h ∈ H } is the right coset of H in G represented by a.
Let aH be a left coset. The element x ∈ G is a representative of aH if xH =
aH .
Since eH = H e = H , the subgroup H is always a left and right coset of itself in
G represented by e. Observe that 1+{0, 3, 6, 9} = {1, 4, 7, 10} and 2+{0, 3, 6, 9} =
{2, 5, 7, 11} are left cosets of {0, 3, 6, 9} in Z12 represented by 1 and 2, respectively.
Also, {σ1 , σ4 }σ1 = {σ1 , σ4 } and {σ1 , σ4 }σ6 = {σ1 σ6 , σ4 σ6 } = {σ6 , σ2 } are right
cosets of {σ1 , σ4 } in S3 represented by σ1 and σ6 , respectively.
Proposition 5.5.3 Let H ≤ G, and let aH , bH be left cosets. Then there exists a
bijection φ : aH → bH defined as φ(ah) = bh for h ∈ H .
Proof To show that φ is 1-1 (one-to-one), suppose that φ(ah1 ) = φ(ah2 ), for
h1 , h2 ∈ H . Then bh1 = bh2 , so that h1 = h2 , and consequently, ah1 = ah2 .
Next let bh ∈ bH . Then clearly, φ(ah) = bh, so that φ is onto. 

The following corollary is immediate.
Corollary 5.5.4 Suppose G is a finite group. Then |aH | = |bH | = |H | for all
a, b ∈ G.
Let G be a group, let H be a subgroup of G. A subset S = {aη }η∈I of G is a left
transversal of H if the family {aη H }η∈I constitutes the collection of all distinct left
cosets of H in G.
Proposition 5.5.5 Let G be a group, let H be a subgroup of G, and let S = {aη }η∈I
be a left transversal of H . Then the collection of distinct left cosets {aη H }η∈I of H
in G forms a partition of the set G, i.e.,

G= aη H,
η∈I

with aη H ∩ aγ H = ∅ whenever aη H = aγ H .

Proof Let g ∈ G. Then gH = aη H for some  η ∈ I , thus G ⊆ η∈I aη H .
Clearly, η∈I aη H ⊆ G, and so, G = η∈I aη H . Suppose there exists an
element x ∈ aη H ∩ aγ H for η, γ ∈ I . Then x = aη h1 = aγ h2 for some
h1 , h2 ∈ H . Consequently, aη = aγ h2 h−11 ∈ aγ H . Now, for any h ∈ H , aη h =
−1
aγ h1 h2 h ∈ aγ H , and so, aη H ⊆ aγ H . By a similar argument aγ H ⊆ aη H , and
so, aη H = aγ H . Thus the collection {aη H }η∈I is a partition of G. 

To illustrate Proposition 5.5.5, let G = S3 , H = {σ1 , σ4 }. Then H = σ1 {σ1 , σ4 },
σ2 H = {σ2 , σ5 } and σ3 H = {σ3 , σ6 } are the distinct left cosets of H which form
the partition of S3 ,

{H, σ2 H, σ3 H }.
5.6 Homomorphisms of Groups 87

For another example, let G = Z, H = 3Z. Then the collection of distinct left cosets
is {3Z, 1 + 3Z, 2 + 3Z} which forms a partition of Z.
In many cases, even if the group G is infinite, there may be only a finite number
of left cosets. When this occurs we define the number of left cosets of H in G to be
the index [G : H ] of H in G. For instance, [Z : 3Z] = 3.
If the group G is finite, we have the following classical result attributed to
Lagrange.
Proposition 5.5.6 (Lagrange’s Theorem) Suppose H ≤ G with |G| < ∞. Then
|H | divides |G|.
Proof By Corollary 5.5.4 any two left cosets have the same number of elements,
and this number is |H |. Moreover, the number of left cosets of H in G is [G : H ].
Since the left cosets partition G, we have

|H |[G : H ] = |G|.




5.6 Homomorphisms of Groups

If A and B are sets, then the functions f : A → B are the basic maps between
A and B. In this section we introduce group homomorphisms: functions preserving
group structure which are the basic maps between groups.
Definition 5.6.1 Let G, G be groups. A map ψ : G → G is a homomorphism of
groups if

ψ(ab) = ψ(a)ψ(b)

for all a, b ∈ G.
In additive notation, the homomorphism condition is given as

ψ(a + b) = ψ(a) + ψ(b).

For example, the map ψ : Z → Zn given by ψ(a) = (a mod n) is a homomorphism


of groups since

ψ(a + b) = ((a + b) mod n) = (a mod n) + (b mod n) = ψ(a) + ψ(b)

for all a, b ∈ Z. The map ψ : GLn (R) → R× defined as ψ(A) = det(A) is a


homomorphism of groups since by a familiar property of determinants,

ψ(AB) = det(AB) = det(A) det(B) = ψ(A)ψ(B).


88 5 Algebraic Foundations: Groups

Another example comes from elementary calculus: Let R+ = {x ∈ R | x > 0}


denote the multiplicative group of positive real numbers. The map ψ : R+ → R,
defined by ψ(x) = ln(x) is a homomorphism of groups; it is an example of a
homomorphism of a multiplicative group into an additive group.
Definition 5.6.2 A homomorphism of groups ψ : G → G which is both injective
and surjective is an isomorphism of groups. Two groups G, G are isomorphic if
there exists an isomorphism ψ : G → G . We then write G ∼
= G .
Proposition 5.6.3 Let n > 0. Then ψ : Z → nZ with ψ(a) = na is an
isomorphism of additive groups.
Proof Note that

ψ(a + b) = n(a + b) = na + nb = ψ(a) + ψ(b),

∀a, b ∈ Z, thus ψ is a homomorphism. Now suppose na = nb. Then na + (−nb) =


n(a + (−b)) = 0, hence a = b. Thus ψ is an injection. Clearly ψ is surjective. 

A map can of course be a bijection without being a group isomorphism. For
example, there are exactly 6! = 120 functions from S3 to Z6 that are bijections, yet
none is group isomorphisms. Likewise, the groups Z4 and Z2 × Z2 have the same
number of elements, but are not isomorphic.
Let G be any group. The collection of all groups G for which G ∼ = G is
the isomorphism class of groups represented by G. Essentially, two groups are
contained in the same isomorphism class if and only if they are isomorphic. Our
observation above shows that as groups S3 and Z6 are in different isomorphism
classes.

5.7 Group Structure

Let G be a group with binary operation G × G → G, (a, b) → ab. Let a ∈ G, and


let n > 0 be a positive integer. Then by the notation a n we mean

a n = aaa· · · a .
n times

For n < 0, we write

a n = a −1 a −1 a
−1
· · · a −1 .
|n| times

If n = 0, we set

a 0 = e, the identity element of the group.


5.7 Group Structure 89

Now assume that G is an “additive” group, i.e., a group in which the binary
operation is written additively as +. Let a ∈ G and let n > 0 be a positive integer.
Then by the notation na we mean

na = a + a + a+ · · · + a .
n times

For n < 0, we write

na = (−a) + (−a) + (−a) + · · · (−a),


  
|n| times

where −a is the inverse of a. If n = 0, we set

0a = 0, the identity element of the group.

Let S ⊆ G be a subset of G. Then G is generated by S if every element of G can


be written as a finite group product over the (possibly infinite) alphabet = {a n :
a ∈ S, n ∈ Z}. If G is generated by S we write G = S. For example, if G = S3 ,
S = {σ2 , σ4 }, then S3 is generated by S since

σ20 = σ1 , σ21 = σ2 , σ22 = σ3 , σ41 = σ4 , σ2 σ4 = σ5 , σ22 σ4 = σ6 .

Every group is generated by some subset of the group. If necessary one could choose
the set G as a generating set for itself, G = G.
A group G is finitely generated if G = S where S is a finite subset of G. Any
finite group is finitely generated. A group G is cyclic if it is generated by a singleton
subset {a}, a ∈ G. If G is cyclic, then there exists an element a ∈ G for which

G = {a n : n ∈ Z}.

The element a is a generator for the cyclic group G and we write G = a.
In additive notation, a group G is cyclic if there exists a ∈ G for which

G = {na : n ∈ Z}.

For example, the additive group Z5 is cyclic, generated by 1. To see this note that

1 = 1,
1 +5 1 = 2,
1 +5 1 +5 1 = 3,
1 +5 1 +5 1 +5 1 = 4
1 +5 1 +5 1 +5 1 +5 1 = 0.
90 5 Algebraic Foundations: Groups

Thus {n1 : n ∈ Z} = Z5 . Are there any other generators for Z5 ?


The additive group Z is cyclic on the generator 1, 1 = {n1 : n ∈ Z} = Z. Note
that −1 is also a generator for Z.
On the other hand, S3 is not cyclic: there is no permutation σ ∈ S3 for which

S3 = {σ n : n ∈ Z}.
 
012
For instance, σ2 = is not a generator of S3 since
201

σ21 = σ2 ,
σ22 = σ2 ◦ σ2 = σ3 ,
σ23 = σ2 ◦ σ2 ◦ σ2 = σ1 ,
σ24 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ2 ,
σ25 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ3 ,
σ26 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ1 ,

and so on. In fact, {σ2n : n ∈ Z} = {σ1 , σ2 , σ3 } = S3 .


Proposition 5.7.1 Every subgroup of a cyclic group is cyclic.
Proof Let G = a be cyclic, and let H ≤ G. If H = {e}, then H is cyclic and the
proposition is proved. So we assume that H has at least two elements e and b = e.
Since b is non-trivial and H ≤ a, there exists a positive integer k so that a k ∈ H .
By the well-ordering principle for natural numbers, there exists a smallest positive
integer m for which a m ∈ H .
Let h ∈ H . Then h = a n for some integer n. Now by the division algorithm for
integers, n = mq + r for integers q, r with 0 ≤ r < m. Thus h = (a m )q a r , and so,
a r = h(a m )−q ∈ H . But this says that r = 0 since m was chosen to be minimal.
Consequently, h = (a m )q ∈ a m , and so H = a m . 

Let G be any group and let g ∈ G. Then the set

{g n : n ∈ Z}

is a subgroup of G, which we call the cyclic subgroup of G generated by g. We


denote the cyclic subgroup generated by g as g. For instance, in S3 , σ2  =
{σ1 , σ2 , σ3 }. In Q+ , under ordinary multiplication,
 1 1 1 
2 = {2n : n ∈ Z} = . . . , , , , 1, 2, 4, 8, . . . ≤ Q+ .
8 4 2
5.7 Group Structure 91

For an additive example, take G = Z. Then

−3 = {n(−3) : n ∈ Z} = {. . . , 9, 6, 3, 0, −3, −6, −9 . . . } ≤ Z.

Let G be a group and let g ∈ G. The order of g is the order of the cyclic subgroup
g ≤ G.
Proposition 5.7.2 Let G be a finite group and let g ∈ G. Then the order of the
group g is the smallest positive integer m for which g m = e.
Proof Let m = |g| and let

e = g 0 , g 1 , g 2 , . . . , g m−1 (5.1)

be the list of powers of g. We claim that (5.1) contains m distinct powers of g. If


not, let s be the largest integer 1 ≤ s ≤ m − 1 for which the list

e = g 0 , g 1 , g 2 , . . . , g s−1 (5.2)

contains distinct powers of g. Now, g s = g i for some i, 0 ≤ i ≤ s − 1. Thus


g s−i = e. If i ≥ 1, then 1 ≤ s − i ≤ s − 1 with g s−i = g 0 = e, which contradicts
our assumption that list (5.2) contains distinct powers of g. Thus i = 0 and so,
g s = e. Let g t ∈ g, t ∈ Z. By the division algorithm, t = sq + r for some
q, r ∈ Z with 0 ≤ r < s. Thus g t = g sq+r = (g s )q g r = g r , and so, g t is one of
the powers in (5.2). Hence the list (5.2) constitutes the cyclic subgroup g, but this
says that |g| = s < m, a contradiction. We conclude that the list (5.1) contains
distinct powers of g.
Now g m = g i for some 0 ≤ i ≤ m − 1 and arguing as above, we conclude

that i = 0. Thus g m = e. Now suppose there exists 0 < m < m with g m = e.
Then (5.1) does not contain distinct powers of g, a contradiction. 

Thus, if G is finite, then the order of g ∈ G is the smallest positive integer m for
which g m = e.
Proposition 5.7.3 Let G be a group and let g ∈ G be an element of finite order m.
Then g s = e if and only if s is a multiple of m.
Proof Suppose g s = e. Write s = mq + r, 0 ≤ r < m. Then e = g s = (g m )q g r =
g r . We cannot have r > 0 since m is the smallest positive integer with g m = e. Thus
mq = s. The converse is immediate. 

Proposition 5.7.4 Suppose G is finite with n = |G|. Let g ∈ G. Then g n = e.
Proof Let m = |g|. By Proposition 5.7.2, g m = e. By Lagrange’s Theorem m | n,
that is, ml = n for some integer l. Hence g n = g ml = (g m )l = e. 

92 5 Algebraic Foundations: Groups

Let G be a finite abelian group and let Z+ = {n ∈ Z : n > 0}. Let T be the set
of positive integers defined as

T = {t ∈ Z+ : g t = e, ∀g ∈ G}.

Then by Proposition 5.7.4, T is non-empty (since |G| ∈ T ). Hence by the well-


ordering principle, T contains a smallest element f , which is the exponent of G.
The exponent of G is the smallest positive integer f for which g f = e for all g ∈ G;
one has f ≤ |G|.
For example, the exponent of the group product Z4 ×Z6 (a finite abelian group) is
12. Note that 12 ≤ |Z4 × Z6 | = 24. The exponent of a finite abelian group can equal
its order. For instance, let G be a finite group and let g be the cyclic subgroup of
G generated by g. Then by Proposition 5.7.2, the exponent of g is |g|.
Ultimately, we will show that if G is a finite abelian group of exponent f , then
G contains an element of order f (Proposition 5.7.13).
To this end, we introduce some number theory.

5.7.1 Some Number Theory

Definition 5.7.5 Let a, b ∈ Z and assume that a, b are not both 0. The greatest
common divisor of a, b is the unique positive integer d that satisfies
(i) d divides a and d divides b,
(ii) c divides d whenever c is a common divisor of a and b.
We denote the greatest common divisor as d = gcd(a, b). The famous Euclidean
algorithm computes gcd(a, b) using Algorithm 5.3.4.
Algorithm 5.7.6 (EUCLID)
Input: integers a ≥ b ≥ 1
Output: gcd(a, b)
Algorithm:
r0 ← a
r1 ← b
i←1
while ri = 0 do
i ←i+1
ri ← (ri−2 mod ri−1 )
end-while
output ri−1

Proposition 5.7.7 EUCLID is a polynomial time algorithm.
5.7 Group Structure 93

Proof Since Algorithm 5.3.4 runs in time O(m2 ), EUCLID runs in time O(m3 ).


Let us illustrate how EUCLID works to compute gcd(63, 36). On the inputs a = 63,
b = 36, the algorithm computes gcd(63, 36) by executing three iterations of the
while loop:

r2 = (63 mod 36) = 27 (iteration 1)


r3 = (36 mod 27) = 9 (iteration 2)
r4 = (27 mod 9) = 0.

The algorithm stops with i = 4 and outputs r3 = gcd(63, 36) = 9.


The greatest common divisor has the following property.
Proposition 5.7.8 (Bezout’s Lemma) Let a, b be integers, not both zero, and
suppose that d = gcd(a, b). Then there exist integers x, y ∈ Z so that

ax + by = d.

Proof The set of integers

aZ + bZ = {am + bn : m, n ∈ Z}

is a subgroup of the cyclic group Z. Thus by Proposition 5.7.1, aZ + bZ = d  Z for


some integer d  > 0. Thus there exist x, y ∈ Z with ax + by = d  . We claim that
d  = d. Since a ∈ d  Z and b ∈ d  Z, d  divides both a and b. Now suppose that
cm = a and cn = b for c, m, n. Then cmx + cny = c(mx + ny) = d  , so c | d  . It
follows that d = d  . 

For example, we know that 9 = gcd(63, 36). Let us find x, y ∈ Z so that

63x + 36y = 9.

From iteration 1 and iteration 2, we compute 63 = 36 · 1 + 27 and 36 = 27 · 1 + 9.


Thus,

9 = 36 − 27 · 1
= 36 − (63 − 36 · 1) · 1
= 36 − 63 + 36
= 63 · (−1) + 36 · 2.

And so x = −1, y = 2.
94 5 Algebraic Foundations: Groups

Integers a ≥ b ≥ 1 are relatively prime if gcd(a, b) = 1. For instance, a =


20, b = 11 are relatively prime since

r2 = (20 mod 11) = 9,


r3 = (11 mod 9) = 2,
r4 = (9 mod 2) = 1,
r5 = (2 mod 1) = 0,

so that gcd(20, 11) = 1.


A common multiple of the non-zero integers a, b is an integer m > 0 for which
a | m and b | m. Let S be the set of all common multiples of a, b. Then by the well-
ordering principle S has the smallest element which is the least common multiple
of a, b, denoted as lcm(a, b).
Proposition 5.7.9 Let a, b be non-zero integers. Then

|ab|
lcm(a, b) = .
gcd(a, b)

Proof Let d = gcd(a, b). Then gcd(a/d, b/d) = 1. Note that |ab|/d 2 is a
common multiple of a/d and b/d, and so lcm(a/d, b/d) ≤ |ab|/d 2 . By the division
algorithm

|ab|/d 2 = lcm(a/d, b/d) · q + r, 0 ≤ r < lcm(a/d, b/d),

for unique q, r. If r > 0, then r is a common multiple of a/d and b/d, which is
impossible. Thus

|ab|/d 2 = lcm(a/d, b/d) · q,

and so there exist integers m, l with

|ab|/d 2 = mqa/d = lqb/d.

Thus q divides both a/d and b/d, hence q = 1, which gives

lcm(a/d, b/d) = |ab|/d 2 ,

thus

d · lcm(a/d, b/d) = mqa = lqb.

Consequently, d · lcm(a/d, b/d) is a common multiple of a, b and so

lcm(a, b) ≤ d · lcm(a/d, b/d).


5.7 Group Structure 95

There exist integers s, t so that sa = tb = lcm(a, b), so sa/d = tb/d =


lcm(a, b)/d. Thus lcm(a, b)/d is a common multiple of a/d, b/d. Hence

lcm(a/d, b/d) ≤ lcm(a, b)/d,

which yields

d · lcm(a/d, b/d) = lcm(a, b),

and so lcm(a, b) = |ab|/d. 



Proposition 5.7.10 Let G = a be cyclic of order n and let s be an integer. Then
the order of the cyclic subgroup of G generated by a s is n/d where d = gcd(n, s).
Proof Note that n is the smallest positive integer for which a n = e. The order
of a s  is the smallest positive integer m for which (a s )m = a sm = e. Note that
sm ≥ n, but even more: n must divide sm. For if not, there exist positive integers
v, w with sm = nv + w with 0 < w < n. Thus e = a sm = a nv a w = a w , which
contradicts the minimality of n.
Thus the order of a s  is the smallest positive integer m for which n divides sm.
We compute the value of m as follows. Since n divides sm, sm is a common multiple
of s and n, hence sm ≥ lcm(s, n). By Proposition 5.7.9, m ≥ lcm(s, n)/s = n/d,
where d = gcd(n, s). Now, (n/d)s = ns/d = lcm(n, s), which is a multiple of n,
and so (a s )(n/d) = e. It follows that m = n/d. 

Corollary 5.7.11 Let G = a be cyclic of order n, and let s be an integer. Then
G = a s  if and only if gcd(n, s) = 1.
Proof Exercise. 

We can extend the notion of lcm. Let a1 , a2 , . . . , ak , k ≥ 2, be non-zero integers.
Then the least common multiple of a1 , a2 , . . . , ak , denoted as lcm(a1 , a2 , . . . , ak ),
is the smallest positive integer m > 0 which is a multiple of ai for all i, 1 ≤ i ≤ k.
For k ≥ 3, we can define lcm(a1 , a2 , . . . , ak ) inductively as follows:

lcm(a1 , a2 , . . . , ak ) = lcm(lcm(a1 , a2 , . . . , ak−1 ), ak ).

Proposition 5.7.12 Let G be a finite abelian group. Then the exponent of G is the
lcm of the orders of the elements of G.
Proof Let g ∈ G, let mg denote the order of g and let f be the exponent of G. By
Proposition 5.7.3, f is a multiple of mg , thus f is a common multiple of the set
{mh }h∈G . Thus lcm({mh }h∈G ) ≤ f . Note that g lcm({mh }h∈G ) = e for each g ∈ G,
thus lcm({mh }h∈G ) = f by definition of the exponent. 

Proposition 5.7.13 Let G be a finite abelian group with exponent f . Then there
exists an element of G whose order is f .
96 5 Algebraic Foundations: Groups

Proof Let n = |G| and list the elements of G as e = g0 , g1 , . . . , gn−1 . Let mgi be
the order of gi for 0 ≤ i ≤ n − 1. Clearly, mg0 = 1. Consider the elements g1 , g2
and let d = gcd(mg1 , mg2 ). Then mg1 and mg2 /d are coprime. Hence the element
g1 g2d has order mg1 mg2 /d = lcm(mg1 , mg2 ).
Next, consider the elements g1 g2d and g3 . Let

d  = gcd(lcm(mg1 , mg2 ), mg3 ).



Then g1 g2d g3d has order

lcm(lcm(mg1 , mg2 ), mg3 ) = lcm(mg1 , mg2 , mg3 ).

Continuing in this manner we can construct an element of order


lcm(mg1 , mg2 , . . . , mgn−1 ), which is equal to f by Proposition 5.7.12. 

Example 5.7.14 Consider the direct product group

Z2 × Z3 = {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}

whose elements have orders

1, 3, 3, 2, 6, 6,

respectively. By Proposition 5.7.12, the exponent of Z2 × Z3 is


lcm(1, 3, 3, 2, 6, 6) = 6 and (1, 2) has order 6. Note that Z2 × Z3 = (1, 2) and so
Z2 × Z3 is a finite cyclic group of order 6.
We close the chapter with a classical theorem of number theory.
Theorem 5.7.15 (The Chinese Remainder Theorem) Let n1 , n2 be integers that
satisfy n1 , n2 > 0 and gcd(n1 , n2 ) = 1 and let a1 , a2 be any integers. Then the
system of congruences

x ≡ a1 (mod n1 )
x ≡ a2 (mod n2 )

has a unique solution modulo n1 n2 .


Proof By Bezout’s lemma, there exist integers x1 , x2 so that n1 x1 +n2 x2 = 1. Then

(a1 − a2 )(n1 x1 + n2 x2 ) = (a1 − a2 ),

and

x = a1 + n1 x1 (a2 − a1 ) = a2 + n2 x2 (a1 − a2 ),

is a solution modulo n1 n2 .
5.8 Exercises 97

To show that this solution is unique modulo n1 n2 , let x  be some other solution.
We may assume without loss of generality that x > x  . Then n1 | (x − a1 ) and
n1 | (x  − a1 ), and so n1 | (x − x  ). Likewise, n2 | (x − x  ). Thus x − x  is
a common multiple of n1 and n2 and so, lcm(n1 , n2 ) ≤ x − x  . By the division
algorithm,

x − x  = lcm(n1 , n2 ) · q + r

for some q, r with 0 ≤ r < lcm(n1 , n2 ). But then r = (x − x  ) − lcm(n1 , n2 ) is


a common multiple of n1 and n2 and so r = 0. Thus lcm(n1 , n2 ) | (x − x  ). By
Proposition 5.7.9, n1 n2 = lcm(n1 , n2 ), thus x  ≡ x (mod n1 n2 ). 

Example 5.7.16 Let n1 = 2, n2 = 3, a1 = 1, a2 = 2. Then the Chinese Remainder
theorem applies to show that there is a unique residue x in Z6 for which

x ≡ 1 (mod 2)
x ≡ 2 (mod 3).

In fact, as one can check, x = 5. Observe that the residue x = 5 has order 6 in Z6
and the element (a1 , a2 ) = (1, 2) has order 6 in the group product Z2 × Z3 . We have
Z6 = 5 and Z2 × Z3 = (1, 2). Both Z6 and Z2 × Z3 are cyclic groups of order
6; there is an isomorphism of groups

ψ : Z6 → Z2 × Z3

defined by 5 → (1, 2).



The Chinese Remainder theorem (CRT) plays a central role in our development
of cryptographic systems in subsequent chapters. For instance, the CRT is critical
to the definition of the RSA public key cryptosystem (Section 9.2) and the RSA
Digital Signature scheme (Section 10.2). It is also used in the construction of the
Blum-Blum-Shub pseudorandom sequence in Section 11.4.5.

5.8 Exercises

1. Let S = {a, b, c}, and let  be the binary operation on S defined by the following
table:

 a b c
a c c a
b b a a
c a b b
98 5 Algebraic Foundations: Groups

(a) Compute (b  (a  a))  (b  c).


(b) Determine whether S,  is a semigroup.
2. Let G be a group. Prove that the identity element e is unique. Prove that for
a ∈ G, there exists a unique inverse element a −1 with aa −1 = e = a −1 a.
3. Let GL2 (R) be the group of invertible 2 × 2 matrices over R.
 
45
(a) Show that A = is in GL2 (R).
23
(b) Compute A−1 .
4. Let S5 denote the symmetric group on the 5 letter set = {0, 1, 2, 3, 4}, and
let
   
01234 01234
σ = and τ =
32104 10342

be elements of S5 .
Compute the following.
(a) τ −1
(b) σ ◦τ
(c) τ ◦σ
(d) |S5 |
5. Let S2 denote the symmetric group on the 2 letter set = {0, 1}.
(a) List the elements of S2 in permutation notation.
(b) Compute the group table for S2 .
6. Let σ ∈ S5 be the permutation given in Exercise 4. Decompose σ into a product
of transpositions. Is σ even or odd?
7. Prove that a permutation in Sn , n ≥ 2, cannot be both even and odd.
8. Compute the following.
(a) (17 mod 6)
(b) (256 mod 31)
(c) (−2245 mod 7)
9. Compute the binary operation table for the group of residues Z6 , +.
10. Compute (5, 7) · (2, 13) in the direct product group Z8 × Z20 .
11. List all of the subgroups of the group Z20 , +.
12. Let G be a group and H be a finite non-empty subset of G that is closed under
the binary operation of G. Prove that H is a subgroup of G.
13. Let ψ : Z6 → Z5 be the map defined as a → (a mod 5) for a ∈ Z6 . Determine
whether ψ is a homomorphism of groups.
14. Compute the cyclic subgroup of R+ generated by π .
15. Compute the cyclic subgroup of Z generated by −1.
5.8 Exercises 99

16. Let Q× = {x ∈ Q | x = 0} denote the multiplicative group of non-zero


rationals. Determine whether Q× is finitely generated.
17. Compute d = gcd(28, 124). Find integers x, y so that 28x + 124y = d.
18. Find the order of the element 3 in Z15 . 
0123
19. Find the order of the element σ = in S4 .
3210
20. Find the order of the element (2, 3) in the direct product group Z4 × Z8 .
21. Compute the exponent and the order of the group Z10 × Z12 .
22. Let d = gcd(a, b). Prove that gcd(a/d, b/d) = 1.
23. Find the unique solution in Z4300 of the system of congruences

x ≡ 30 (mod 43)
x ≡ 87 (mod 100).
Chapter 6
Algebraic Foundations: Rings and Fields

6.1 Introduction to Rings and Fields

In contrast to a group, a ring is a set together with two binary operations.


Definition 6.1.1 A ring is a non-empty set R together with two binary operations,
addition, +, and multiplication, ·, that satisfy
(i) R, + is an abelian group with identity element 0R
(ii) For all a, b, c ∈ R,

a · (b · c) = (a · b) · c

(iii) For all a, b, c ∈ R,

a · (b + c) = a · b + a · c,

and

(a + b) · c = a · c + b · c

It is often more convenient to denote the multiplication a · b by ab.


The integers Z, together with ordinary addition + and ordinary multiplication ·,
are a ring called the ring of integers. Likewise, Q, +, · and R, +, · are rings. The
collection of residues Zn modulo n is a ring under addition +n and multiplication
·n defined as

(a mod n) +n (b mod n) = ((a + b) mod n),

(a mod n) ·n (b mod n) = ((ab) mod n).

© Springer Nature Switzerland AG 2022 101


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_6
102 6 Algebraic Foundations: Rings and Fields

For instance, Z4 is a ring with binary operations given by the tables

+4 0 1 2 3 ·4 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1

Here are some ways to construct a new ring from a given ring.
Example 6.1.2 Let R be a ring, and let x be an indeterminate. The collection of all
polynomials

a0 + a1 x + a2 x 2 + · · · + an−1 x n−1 + an x n , ai ∈ R,

is a ring under the usual polynomial addition and multiplication, thus


n 
n 
n
ai x i + bi x i = (ai + bi )x i ,
i=0 i=0 i=0


m  
n  
m 
n
ai x i
bj x j
= ai bj x i+j .
i=0 j =0 i=0 j =0

We denote this ring of polynomials by R[x]. If an = 0, then the degree of p(x) is


n. We assume that the degree of the zero polynomial p(x) = 0 is −∞. 

If we take the ring R to be the polynomial ring R[x] and let y be an indeterminate,
then we can form the ring of polynomials in y over R[x], which is denoted
as R[x][y] (or alternatively, as R[x, y]). This is the ring of polynomials in two
indeterminates x and y. In R[x, y], x and y commute: xy = yx.
This construction can be extended inductively: let R be a ring, and let
{x1 , x2 , . . . , xn } denote a set of indeterminates. The set of all polynomials
in {x1 , x2 , . . . , xn } over R is a ring with the obvious polynomial addition
and multiplication; the xi commute. This ring of polynomials is denoted as
R[x1 , x2 , . . . , xn ].
Example 6.1.3 Let R be a ring and let Matn (R) denote the collection of all n ×
n matrices with entries in R. Then Matn (R) is a ring under matrix addition and
multiplication. The ring Matn (R) is the ring of n × n matrices over R. 

For instance, if we take R = R, then Matn (R) is the ring of n × n matrices with
entries in R, which is quite familiar to students of linear algebra.
6.1 Introduction to Rings and Fields 103

We may also take n = 2 and R = Z10 to obtain Mat2 (Z10 ), which is the ring of
2 × 2 matrices over Z10 . In Mat2 (Z10 ), we have
       
37 09 3+0 7+9 36
+ = = ,
60 41 6+4 0+1 01
      
37 09 3·0+7·4 3·9+7·1 84
= = .
60 41 6·0+0·4 6·9+0·1 04

(Note: entries are reduced modulo 10.)


Just as we did for groups (see Section 5.4), we can form the direct product of a
collection of rings.
Example 6.1.4 If Ri , for i = 1, . . . , k, is a collection of rings, then the Cartesian
k
product Ri is a ring, where
i=1

(a1 , a2 , . . . , ak ) + (b1 , b2 , . . . , bk ) = (a1 + b1 , a2 + b2 , . . . , ak + bk ),

and

(a1 , a2 , . . . , ak )(b1 , b2 , . . . , bk ) = (a1 b1 , a2 b2 , . . . , ak bk ).

Here, ai + bi denotes the addition in the component ring Ri , and ai bi denotes the
multiplication in the component ring Ri , 1 ≤ i ≤ k. 

To illustrate a direct product, we take k = 2 and R1 = R2 = Z and consider the
ring Z × Z. In Z × Z, (5, 4) + (−3, 1) = (2, 5) and (3, 2)(−1, 6) = (−3, 12).
A ring R is commutative if

ab = ba

for all a, b ∈ R. A ring with unity is a ring R for which there exists a multiplicative
identity element, denoted by 1R and distinct from 0R that satisfies

1R a = a = a1R

for all a ∈ R.
A subring of a ring with unity R is a subset of S of R which is a ring under the
binary operations of R restricted to S. If S is a subring of R, then S is a ring with
unity 0S = 0R , 1S = 1R . For example, Z is a subring of Q.
Suppose R is a ring with unity. A unit of R is an element u ∈ R for which there
exists an element a ∈ R that satisfies

au = 1R = ua.
104 6 Algebraic Foundations: Rings and Fields

The element a is a multiplicative inverse of u and is denoted by u−1 .


For example, the ring Z is a ring with unity with identity element 1Z = 1, and
the only units are 1 and −1. On the other hand, in the ring Q, 1Q = 1, and every
non-zero element is a unit.
Let R be a ring. A zero divisor in R is a non-zero element a ∈ R for which
there exists a non-zero element b ∈ R so that ab = 0. For example 4 and 6 are zero
divisors in the ring Z8 since 4 · 6 = 0 in Z8 (equivalently, (4 · 6) ≡ 0 (mod 8)).
A commutative ring with unity in which there are no zero divisors is an integral
domain. For instance, Z is an integral domain.
A division ring is a ring R with unity in which every non-zero element is a unit.
A commutative division ring is a field. For example, the ring Q is a field, as is R. It
is not hard to show that every field is an integral domain, and however, the converse
is false; see Section 6.5, Exercise 2.
We note that if F is a field, then the ring of polynomials F [x] is an integral
domain.
Let F be a field. A subfield of F is a subring E of F , which is a field under the
binary operations of F restricted to E. For example, Q is a subfield of R.
Let E and F be fields. If E is a subfield of F , then E is a field extension of F
and we write E/F . For instance, R is a field extension of Q.

6.1.1 Polynomials in F [x]

Let F be a field and let F [x] be the ring of polynomials over F .


Proposition 6.1.5 (Division Theorem) Let f (x) and g(x) be polynomials in F [x]
with f (x) = 0. There exist unique polynomials q(x) and r(x) for which

g(x) = f (x)q(x) + r(x)

where deg(r(x)) < deg(f (x)).


Proof Our proof is by induction on the degree of g(x). If deg(g(x)) < deg(f (x)),
then g(x) = f (x)·0+g(x), as required by the Division Theorem. So we assume that
deg(g(x)) ≥ deg(f (x)). Let f (x) = a0 +a1 x+·+am x m , am = 0, g(x) = b0 +b1 x+
· · ·+bm+n x m+n , bm+n = 0, n ≥ 0. For the trivial case, suppose that deg(g(x)) = 0.
Then g(x) = c, f (x) = d, with c, d non-zero, and so, c = (c/d)d +0, which proves
the theorem.
Next, put
 
bm+n
h(x) = g(x) − x n f (x).
am

Then deg(h(x)) < deg(g(x)), and thus by the induction hypothesis, there exist
q1 (x) and r(x) so that
6.1 Introduction to Rings and Fields 105

h(x) = f (x)q1 (x) + r(x),

with deg(r(x)) < f (x). Now,


   
bm+n
g(x) = f (x) q1 (x) + x n + r(x),
am

which proves the theorem.


We leave it to the reader to prove the uniqueness of q(x) and r(x). 

Let f (x) ∈ F [x]. A zero (or root) of f (x) in F is an element a ∈ F for which
f (a) = 0.
Proposition 6.1.6 (Factor Theorem) Let f (x) ∈ F [x] be a polynomial and let
a ∈ F be a zero of f (x). Then x − a divides f (x).
Proof By the Division Theorem, there exist polynomials q(x), r(x) ∈ F [x] for
which

f (x) = (x − a)q(x) + r(x),

with deg(r(x)) < deg(x − a) = 1, and so, r(x) = r for some r ∈ F . Now,

0 = f (a) = (a − a)q(a) + r = 0 · q(a) + r = r,

and so, r(x) = 0, and consequently, x − a | f (x). 



Proposition 6.1.7 Let f (x) ∈ F (x) be a polynomial of degree n ≥ 0. Then f (x)
can have at most n roots in F .
Proof If f (x) has degree 0, then f (x) = c, c = 0, and so, f (x) has no zeros in F .
So we assume that deg(f (x)) = n ≥ 1. If a1 ∈ F is a zero of f (x), then by the
Factor Theorem,

f (x) = (x − a1 )g1 (x),

where g1 (x) is a polynomial in F [x] of degree n − 1. If a2 ∈ F is a zero of g1 (x),


then again by the Factor Theorem,

f (x) = (x − a1 )(x − a2 )g2 (x),

with g2 (x) ∈ F [x] with deg(g2 (x)) = n − 2. We continue in this manner until we
arrive at the factorization

f (x) = (x − a1 )(x − a2 )(x − a3 ) · · · (x − ak )gk (x),


106 6 Algebraic Foundations: Rings and Fields

where gk (x) has no roots in F and deg(gk ) = n − k. Now, a1 , a2 , . . . , ak is a list of


k zeros of f (x) in F . Note that k ≤ n since the degrees on the left-hand side and
the right-hand side must be equal.
We claim that the ai are all of the zeros of f (x) in F . By way of contradiction,
suppose that b ∈ F satisfies b = ai for 1 ≤ i ≤ k, with f (b) = 0. Then

0 = f (b) = (b − a1 )(b − a2 )(b − a3 ) · · · (b − ak )gk (b),

with none of the factors on the right-hand side equal to 0. Thus F has zero divisors,
which is impossible since F is an integral domain. 


6.2 The Group of Units of Zn

Let R be a ring with unity 1R . As a ring, R is provided with two binary operations:
addition and multiplication. Under the addition, R is an abelian group. Is there a
group arising from the multiplication of R?
Proposition 6.2.1 Let R be a ring with unity. Let U (R) denote the collection of
units of R. Then U (R) together with the ring multiplication is a group.
Proof We only have to notice that ab is a unit whenever a and b are. 

The group U (R), · is the group of units of R. For example, Z is a (commutative)
ring with unity. The group of units U (Z) is the (abelian) group {1, −1} with group
table:

· 1 −1
1 1 −1
−1 −1 1

The ring of residues Zn is a commutative ring with unity. Our goal in this section
is to compute U (Zn ).
Proposition 6.2.2 The units of the ring Zn are precisely those residues m, 1 ≤ m ≤
n − 1, that satisfy gcd(n, m) = 1.
Proof If gcd(n, m) = 1, then by Proposition 5.7.8, there exist x, y so that mx +
ny = 1. Thus 1 − mx = ny, which says that n divides 1 − mx. Hence mx ≡ 1
(mod n). Consequently, m is a unit with m−1 ≡ x (mod n).
For the converse, suppose that m is a unit in Zn . Then there exists x ∈ Zn so that
mx = 1 in Zn . Thus (mx mod n) = 1 so that mx = nq +1 for some q ∈ Z. Thus the
algorithm EUCLID yields gcd(n, mx) = 1, which implies that gcd(n, m) = 1.  
6.2 The Group of Units of Zn 107

For example, by Proposition 6.2.2, U (Z8 ) = {1, 3, 5, 7}. The group table for U (Z8 )
is

·8 1 3 5 7
1 1 3 5 7
3 3 1 7 5
5 5 7 1 3
7 7 5 3 1

For another example, we see that U (Z5 ) = {1, 2, 3, 4}. The group table for U (Z5 )
is

·5 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1

In U (Z5 ), observe that 2 · 3 = 1, and thus 2−1 = 3.


Proposition 6.2.3 Let p be a prime number. Then U (Zp ) = {1, 2, 3, . . . , p − 1}.
Consequently, |U (Zp )| = p − 1.
Proof {1, 2, 3, . . . , p − 1} are the residues in Zp that are relatively prime to p. 

Corollary 6.2.4 Let p be a prime number. Then Zp is a field.
Proof Certainly, Zp is a commutative ring with unity. By Proposition 6.2.3, it is a
division ring. 

Let Z+ denote the set of positive integers.
Definition 6.2.5 (Euler’s Function) Let φ : Z+ → Z be the function defined by
the rule: φ(n) is the number of integers 1 ≤ m ≤ n − 1 that satisfy gcd(m, n) = 1.
Then φ is Euler’s function.
For instance, φ(5) = 4, φ(8) = 4. If p is prime, then φ(p) = p − 1.
Proposition 6.2.2 says that φ(n) = |U (Zn )|.
Theorem 6.2.6 (Fermat’s Little Theorem) Let p be a prime number and let a be
an integer with gcd(a, p) = 1. Then

a p−1 ≡ 1 (mod p).


108 6 Algebraic Foundations: Rings and Fields

Proof Let a  = (a mod p). Then gcd(a  , p) = 1, and so a  ∈ U (Zp ). By


Proposition 6.2.3, |U (Zp )| = p − 1, and so by Proposition 5.7.4, (a  )p−1 ≡ 1
(mod p).
Now, (a  mod p) = (a mod p) and so we may write a = a  + pm for some
integer m. Then by the binomial theorem,


p−1
p−1

  p−1
a p−1
= (a + pm) p−1
= (a ) + (a  )p−1−i (pm)i ,
i
i=1

and so,

a p−1 ≡ (a  )p−1 (mod p).

The result follows. 



Corollary 6.2.7 Let p be a prime number and let a be any integer. Then

a p ≡ a (mod p).

Proof If gcd(a, p) = 1, then a p−1 ≡ 1 (mod p) by Proposition 6.2.6. Thus a ·


a p−1 = a p ≡ a (mod p). If p | a, then a ≡ 0 (mod p), and hence a p ≡ 0p ≡
0 ≡ a (mod p). 

Proposition 6.2.8 Let p be prime and let a ∈ U (Zp ). Then a is its own inverse in
U (Zp ) if and only if a ≡ 1 (mod p) or a ≡ −1 (mod p).
Proof If a ≡ ±1 (mod p), then a 2 ≡ 1 (mod p), so a ≡ a −1 (mod p).
Conversely, suppose that a 2 ≡ 1 (mod p). Then (a + 1)(a − 1) ≡ 0 (mod p).
Since p is prime, this implies that p | (a + 1) or p | (a − 1), and so a ≡ ±1
(mod p). 

Theorem 6.2.9 (Wilson’s Theorem) Let p be prime. Then (p − 1)! ≡ −1
(mod p).
Proof If p = 2, then (p − 1)! ≡ 1! ≡ −1 (mod 2). So we assume that p > 2.
Consider the list of residues

1, 2, 3 . . . , p − 2, p − 1.

Each of these residues has a unique inverse in the list. By Proposition 6.2.8, the
inverse of 1 is 1, the inverse of p − 1 is p − 1, and all other residues have distinct
inverses in the list 2, 3, . . . , p − 2. Thus,

p−1
(p − 1)! = i ≡ p − 1 ≡ −1 (mod p).
i=1



6.2 The Group of Units of Zn 109

Euler has given a generalization of Fermat’s Little Theorem.


Theorem 6.2.10 (Euler’s Theorem) Let n > 1 be an integer and let a be an
integer with gcd(a, n) = 1. Then

a φ(n) ≡ 1 (mod n).

Proof Let a  = (a mod n). Since gcd(a, n) = 1, gcd(a  , n) = 1, and so a  ∈


U (Zn ). Thus (a  )φ(n) ≡ 1 (mod n) by Proposition 5.7.4. It follows that a φ(n) ≡ 1
(mod n). 


6.2.1 A Formula for Euler’s Function

As we have seen, Euler’s function φ counts the number of units in Zn . Our goal here
is to derive a formula for φ. We already know that φ(p) = p − 1 for p prime. We
begin with a generalization of this formula.
Proposition 6.2.11 Let p be a prime number and let a ≥ 1 be an integer. Then

φ(pa ) = pa − pa−1 .

Proof The integers m with 1 ≤ m ≤ pa that are not relatively prime to pa are
precisely those of the form m = kp for 1 ≤ k ≤ pa−1 and there are exactly pa−1
such integers. So the number of integers m, 1 ≤ m ≤ pa , that are relatively prime
to pa is pa − pa−1 . Thus φ(pa ) = pa − pa−1 . 

Proposition 6.2.12 Suppose that m, n ≥ 1 are integers with gcd(m, n) = 1. Then
φ(mn) = φ(m)φ(n).
Proof We begin by writing all integers k, 1 ≤ k ≤ mn, in the following
arrangement (m rows × n columns):

1 m+1 2m + 1 ··· (n − 1)m + 1


2 m+2 2m + 2 ··· (n − 1)m + 2
3 m+3 2m + 3 ··· (n − 1)m + 3
.. .. .. ..
. . . .
m 2m 3m ··· mn

Now, exactly φ(m) rows of this matrix contain integers that are relatively prime
mn, and each of these φ(m) rows contains exactly φ(n) integers that are relatively
prime to mn. Thus the total number of integers 1 ≤ k ≤ mn that are relatively prime
to mn is φ(m)φ(n). 

110 6 Algebraic Foundations: Rings and Fields

Proposition 6.2.13 Let n ≥ 2 and let n = p1e1 p2e2 · · · pkek be the prime factor
decomposition of n. Then
    
1 1 1
φ(n) = n 1 − 1− ··· 1 − .
p1 p2 pk
e
Proof Since gcd(piei , pj j ) = 1 for 1 ≤ i, j ≤ k, i = j , we have φ(n) =
φ(p1e1 )φ(p2e2 ) · · · φ(pkek ) by Proposition 6.2.12. Thus

φ(n) = (p1e1 − p1e1 −1 )(p2e2 − p2e2 −1 ) · · · (pkek − pek −1 ) by Proposition 6.2.11


     
e1 1 e2 1 ek 1
= p1 1 − p2 1 − · · · pk 1 −
p1 p2 pk
    
e1 e2 ek 1 1 1
= (p1 p2 · · · pk ) 1 − 1− ··· 1 −
p1 p2 pk
    
1 1 1
= n 1− 1− ··· 1 − .
p1 p2 pk



For example, φ(2200) = φ(23 · 52 · 11) = 2200(1 − 12 )(1 − 15 )(1 − 1
11 ) = 800.

6.3 U (Zp ) Is Cyclic

We prove a theorem that is of fundamental importance in cryptography.


Theorem 6.3.1 Let p be a prime number. Then the group of units U (Zp ) is cyclic,
i.e., there exists an element g ∈ U (Zp ) for which {g n : n ∈ Z} = U (Zp ).
Proof Since Zp is a commutative ring with unity, U (Zp ) is a finite abelian group
whose identity element is the residue 1. Let f be the exponent of U (Zp ). We have
f ≤ |U (Zp )| = p − 1.
By Corollary 6.2.4, Zp is a field. Consider the polynomial x f − 1 in Zp [x]. By
the definition of exponent, g f = 1 for all g ∈ U (Zp ) and so x f − 1 has p − 1
distinct zeros in Zp . Thus f = p − 1 by Proposition 6.1.7.
By Proposition 5.7.13, there exists an element g ∈ U (Zp ) which has order p − 1.
But then |g| = p − 1 so that g = U (Zp ). Thus U (Zp ) is cyclic. 

An element g ∈ U (Zp ) that generates U (Zp ) is a primitive root modulo p. For
example, 2 is a primitive root modulo 5 since 2 generates U (Z5 ), and 3 is a primitive
root modulo 7 since 3 generates U (Z7 ).
For n not prime, U (Zn ) may not be cyclic. For example, U (Z8 ) is not cyclic.
6.4 Exponentiation in Zn 111

Theorem 6.3.2 U (Zn ) is cyclic if and only if n = 2, 4, pe , 2pe , where p is an odd


prime and e ≥ 1.
Proof For a proof, see [28, Proposition 4.1.3] or [47, Theorem 9.15]. 

Let n > 0 and let a ∈ U (Zn ). Then the order of a in U (Zn ) is the order of the
cyclic subgroup a ≤ U (Zn ); it is the smallest positive integer m for which a m = 1
in U (Zn ), or equivalently, a m ≡ 1 (mod n) (Proposition 5.7.2).
Theorem 6.3.1 says that U (Zp ), p prime, contains an element of order p − 1.
For composite moduli, we have the following.
Proposition 6.3.3 Let n = n1 n2 , gcd(n1 , n2 ) = 1. Let a ∈ U (Zn ), a1 =
(a mod n1 ), a2 = (a mod n2 ). Let l be the order of a1 in U (Zn1 ) and let m be the
order of a2 in U (Zn2 ). Then the order of a in U (Zn ) is lcm(l, m).
Proof We have a1lcm(l,m) ≡ 1 (mod n1 ) and a2lcm(l,m) ≡ 1 (mod n2 ) since l |
lcm(l, m) and m | lcm(m, l). By the binomial theorem,

lcm(l,m) lcm(l,m)
a lcm(l,m) ≡ a1 (mod n1 ) and a lcm(l,m) ≡ a2 (mod n2 ).

So n1 | (a lcm(l,m) − 1) and n2 | (a lcm(l,m) − 1), and thus a lcm(l,m) − 1 is a common


multiple of n1 , n2 . By the division algorithm, lcm(n1 , n2 ) | (a lcm(l,m) − 1). Since
gcd(n1 , n2 ) = 1, lcm(n1 , n2 ) = n1 n2 = n, and so, n | (a lcm(l,m) − 1), and hence
a lcm(l,m) ≡ 1 (mod n).
We show that lcm(l, m) is the order of a in U (Zn ). Suppose a k ≡ 1 (mod n) for
some k. Then a k ≡ a1k ≡ 1 (mod n1 ). Write k = lq + r, 0 ≤ r < l. Then r = 0,
hence l | k. Likewise, m | k, and so, k is a common multiple of l, m which yields
k ≥ lcm(l, m). 

For example, the order of 2 in U (Z35 ) can be computed as follows. The order of
2 in U (Z5 ) is 4, and the order of 2 in U (Z7 ) is 3. Thus the order of 2 in U (Z35 ) is
lcm(4, 3) = 12.
In Section 11.4, we will apply Proposition 6.3.3 to the case n = pq, where p and
q are distinct primes.

6.4 Exponentiation in Zn

Many important cryptosystems depend on the computation of

(a b mod n),

where a, b, and n are integers with a > 0, b ≥ 0, and n > 0. Here is an algorithm
that computes (a b mod n).
112 6 Algebraic Foundations: Rings and Fields

Algorithm 6.4.1 (EXP_MOD_n)


Input: integers a, b, n, n > a ≥ 0, n > b ≥ 0, n > 0, m = n + 1,
b encoded in binary as b = bm bm−1 · · · b1
Output: (a b mod n)
Algorithm:
c←1
for i = 1 to m
if bi = 1 then c ← (ca mod n)
a ← (a 2 mod n)
next i
output c


To see how EXP_MOD_n works, we compute (39 mod 11). Here a = 3, b = 9,
and n = 11, so m = 4. In binary, b = (9)2 = 1001. On the first iteration of the
loop, c becomes 3, and a is 9. On the second iteration, c remains the same, and
a becomes (81 mod 11) = 4. On the third iteration, c remains the same, and a is
(16 mod 11) = 5. On the fourth iteration, c becomes ((3 · 5) mod 11) = 4, which is
the correct output.
Proposition 6.4.2 EXP_MOD_n has running time O(m3 ), where m = log2 (n)+
1.
Proof The algorithm performs m iterations of the for-next loop. On each iteration,
the algorithm performs at most 2·O(m2 ) steps (Proposition 4.2.6). Thus the running
time is O(m3 ). 

Remark 6.4.3 Algorithm 6.4.1 is the analog of Algorithm 4.2.6 with a 2 playing the
role of 2a. 

Remark 6.4.4 Algorithm 6.4.1 is not feasible unless we reduce modulo n; a b is of
size m = log2 (a b ) + 1 ≈ blog2 (a). Thus a b grows too fast for a computer,
unless we reduce modulo n. This is also why we require finite cyclic groups for use
in the Diffie–Hellman key exchange protocol (see Section 12.1). 


6.4.1 Quadratic Residues

An element a ∈ U (Zn ) is a quadratic residue modulo n if there exists x ∈ U (Zn )


for which x 2 ≡ a (mod n). For example, 9 is a quadratic residue modulo 10 since
72 ≡ 9 (mod 10).
Let n = p be prime. We define the Legendre symbol as follows:
  
a 1 if a is a quadratic residue modulo p
=
p −1 otherwise.
6.4 Exponentiation in Zn 113

Proposition 6.4.5 (Euler’s Criterion) Let p be an odd prime and let a ∈ Zp ,


a ≡ 0 (mod p). Then
 
a
≡ a (p−1)/2 (mod p).
p

Proof Suppose that a


p = 1. Then x 2 ≡ a (mod p) for some x ∈ Zp with x ≡ 0
(mod p). Thus

a (p−1)/2 ≡ (x 2 )(p−1)/2 ≡ x p−1 ≡ 1 (mod p),

where the last equivalence is by Fermat’s Little Theorem.


On the other hand, suppose that pa = −1. Then x 2 ≡ a (mod p) has no
solution in Zp . Thus for each residue i in U (Zp ), there is a unique residue j = i ∈
U (Zp ) with ij ≡ a (mod p), in fact, j = i −1 a. Consequently,

p−1
i ≡ (p − 1)! ≡ a (p−1)/2 (mod p).
i=1

By Wilson’s theorem (Theorem 6.2.9), (p − 1)! ≡ −1 (mod p) and the result


follows. 

Proposition 6.4.6 Let p be an odd prime and let a, b ∈ U (Zp ). Then
    
ab a b
=
p p p

Proof This follows from Euler’s criterion (Proposition 6.4.5). 



The Jacobi symbol generalizes the Legendre symbol. Let n ≥ 2 be an integer
with

n = p1e1 p2e2 · · · pm
em
,

for distinct primes pi and ei ≥ 1. Then for a ∈ U (Zn ), the Jacobi symbol is defined
as
 e1  e2  
a a a a em
= ··· ,
n p1 p2 pm

where we take a = (a mod pi ) in the Legendre symbol a


pi for 1 ≤ i ≤ m.
114 6 Algebraic Foundations: Rings and Fields

 symbol takes on the values ±1. However, in contrast to the Legendre


The Jacobi
symbol, an = 1 does not imply that a is a quadratic residue modulo n. For example,
2
15 = 23 25 = (−1)(−1) = 1, yet 2 is not a quadratic residue modulo 15.
For n ≥ 2, let QRn denote the set of quadratic residues modulo n. Let
 a 
Jn(1) = a ∈ U (Zn ) | =1 ,
n
 a 
Jn(−1) = a ∈ U (Zn ) | = −1 .
n

We have QRn ⊆ Jn(1) . To see this, suppose a ≡ x 2 (mod n) for some integer x.
Then a ≡ x 2 (mod p) for any prime p dividing n. Thus pa = 1, and so by the
(1) (−1)
definition of the Jacobi symbol, a ∈ Jn . Of course, if a ∈ Jn , then a ∈ QRn .
For the case n = p, a prime, there are φ(p) = p − 1 elements in U (Zp ). If
p = 2, then J2(1) = {1}, J2(−1) = ∅. So, we assume that p is an odd prime.
(1) (−1)
Proposition 6.4.7 Let p be an odd prime. Then |Jp | = |Jp | = (p − 1)/2.
(1)
Proof Note that Jp is a subgroup of U (Zp ) by Proposition 6.4.6.
Let g be a primitive root modulo p. Then g is not a quadratic residue modulo p
since g (p−1)/2 ≡ 1 (mod p).
Since gcd(2, p − 1) = 2, the order of g 2 is (p − 1)/2 by Proposition 5.7.10.
(1)
Thus g 2 is a quadratic residue modulo p by Euler’s criterion. Since g 2 ∈ Jp ,
(1) (1)
|Jp | ≥ (p − 1)/2, but since Jp is a proper subgroup of U (Zp ), we conclude that
(1) (−1)
|Jp | = (p − 1)/2. Since a
p = ±1 for all a ∈ U (Zp ), |Jp | = (p − 1)/2. 


Proposition 6.4.8 Let n = pq be a product of distinct odd primes. Then


(1) (−1)
(i) |Jn | = |Jn | = (p − 1)(q − 1)/2.
(1)
(ii) Exactly half of the elements of Jn are quadratic residues modulo n, i.e.,
(1) (1)
QRn ⊆ Jn , |QRn | = (p − 1)(q − 1)/4, |Jn \QRn | = (p − 1)(q − 1)/4.
Proof See Section 6.5, Exercise 19. 

Proposition 6.4.9 Let n = pq be the product of distinct odd primes. Then every
element of QRn has exactly four square roots modulo n: x, −x, y, and −y.
Proof Let a ∈ QRn . Then the congruence x 2 ≡ a (mod n) has a solution x ∈ Zn .
Let x  = (x mod p) and x  = (x mod q). Then the congruence x 2 ≡ a (mod p)
has exactly two solutions x = x  and x = p − x  in Zp . Likewise, the congruence
x 2 ≡ a (mod q) has exactly two solutions x = x  and x = q − x  in Zq .
Next, apply the Chinese Remainder Theorem (Theorem 5.7.15) to the four
systems of congruences
6.4 Exponentiation in Zn 115

 
x ≡ x  (mod p) x ≡ x  (mod p)
(i) (ii)
x ≡ x  (mod q) x ≡ q − x  (mod q)
 
x ≡ p − x  (mod p) x ≡ p − x  (mod p)
(iii) (iv)
x ≡ x  (mod q) x ≡ q − x  (mod q)

to obtain four non-congruent solutions to x 2 ≡ a (mod n): x, −x, y, and −y,


where x and y are the unique solutions modulo n to (i) and (ii), respectively. 

How many (if any) of these square roots are in QRn ? We can answer this question
if we specialize to Blum primes. A Blum prime is a prime satisfying p ≡ 3
(mod 4).
Proposition 6.4.10 Let n = pq be the product of distinct Blum primes. Let a ∈
QRn . Then a has exactly one square root modulo n in QRn .
Proof By Proposition 6.4.9, there are exactly four square roots of a modulo n: ±x,
±y. Since p is Blum, p = 4m + 3, for some m, hence (−1)(p−1)/2 = (−1)2m+1 ≡
−1 (mod p), and thus −1 p = −1 by Proposition 6.4.5. Likewise, −1 q = −1.
−x −x −y y −y
Note this implies p =− x
p , q =− x
q , p =− p , and q =
y
− q , by Proposition 6.4.6.
Thus
    
−x −x −x
=
n p q
     
x x
= − −
p q
x
= .
n
 y
In a similar manner, we obtain −y n = n .
Now, x 2 − y 2 = (x + y)(x − y) ≡ 0 (mod n), and so, nm = pqm = (x +
y)(x − y), for some m. Thus p | (x + y)(x − y), and since p is prime, p | (x + y)
or p | (x − y). Moreover, q | (x + y) or q | (x − y). We cannot have both p
and q dividing the same factor, else n divides the same factor, which implies that
the square roots ±x and ±y are not distinct. So we either have p | (x + y) and
q | (x − y) or p | (x − y) and q | (x + y).
Without loss of generality, we assume the first case. Then x ≡ −y (mod p), thus
x = −y in U (Zp ), and so, px = −y p = − py . Moreover, x = y in U (Zq ),
 
and thus qx = qy . Thus xn = yn .
 ±y   ±y 
We conclude that either ±x ±x
n = 1 and n = −1 or n = −1 and n = 1.
Let us assume the first case. Then ±y are not quadratic residues modulo n, also
116 6 Algebraic Foundations: Rings and Fields

 −x
 
x
n = n = 1. Now, xn = 1 implies that x
p = x
q = 1 or that x
p =

x
q = −1, and −x
n = 1 implies that −x
p = −x
q = 1 or that −x
p =
−x −x
q = −1. Now if = 1, then
x
p p = −1, or vice versa. Consequently,
exactly, one of ±x, say x, is a quadratic residue modulo both p and q and hence n.
This x is the only square root of a that is contained in QRn . 


6.5 Exercises

1. Let Z100 , +, · denote the ring of residues modulo 100.


(a) Compute 67 · (3 + 72).
(b) Show that Z100 is a commutative ring with unity.
2. Find an example of an integral domain that is not a field.
3. Determine whether 3 is aunit in the ring Q, +, ·.
51
4. Determine whether A = is a unit in Mat2 (R). Is A a unit in Mat2 (Z)?
63
  
36 14
5. Compute in Mat2 (Z11 ).
90 71
6. Let Z20 denote the ring of residues modulo 20.
(a) Find the complete set of units in Z20 .
(b) Compute 7−1 in Z20 .
(c) Compute the group table for U (Z20 ).
7. Use Fermat’s Little Theorem to compute (2012 mod 11).
8. Let φ denote Euler’s function. Compute φ(500).
9. Use Euler’s theorem to compute (43129 mod 256).
10. Let n > 4 be a composite number. Prove that (n − 1)! ≡ 0 (mod n).
11. Find a primitive root modulo 13, that is, find a generator for the cyclic group
U (Z13 ).
12. Show that 5 is a primitive root modulo 23. Use this fact to find all of the
primitive roots modulo 23.
13. Use the method of the algorithm EXP_MOD_n to compute 228 mod 31.
14. Show that 2 is a quadratic residue modulo
 23.
15. Compute the Legendre symbol 11 a
for a = 2, 3, 4.

16. Compute the Jacobi symbol 35 a
for a = 2, 10, 30.
17. Find all four of the square roots of 4 modulo 35.
(1) (−1)
18. Let n = 21. Compute J21 , J21 , and QR21 .
19. Prove Proposition 6.4.8.
Chapter 7
Advanced Topics in Algebra

7.1 Quotient Rings and Ring Homomorphisms

Throughout this section all rings are assumed to be commutative rings with unity.
In this section we introduce ideals, quotient rings, and ring homomorphisms, which
are advanced topics in ring theory needed in computer science and cryptography.

7.1.1 Quotient Rings

Let R be a commutative ring with unity. An ideal of R is a subgroup N of the


additive group R, + for which rN ⊆ N for all r ∈ R. Every ring has at least two
ideals: the trivial ideal {0} and R. For example, in the ring Z, the subgroup 4Z is an
ideal since

n · 4Z = {n · 4m : m ∈ Z} = {4nm : m ∈ Z} ⊆ 4Z

for all n ∈ Z.
Let a be any element of R. Then we can always form the ideal

{ra : r ∈ R}

of R-multiples of a; this is the principal ideal of R generated by a, denoted as (a).


For example, (4) is the principal ideal of Z generated by 4; we have

(4) = {m · 4 : m ∈ Z} = 4Z.

© Springer Nature Switzerland AG 2022 117


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_7
118 7 Advanced Topics in Algebra

In the ring Q[x],

(x 2 − 2) = {r(x)(x 2 − 2) : r(x) ∈ Q[x]}

is the principal ideal of Q[x] generated by x 2 − 2.


It is a consequence of Proposition 5.7.1 that every ideal of the ring of integers Z
is principal. This is also true for the ring of polynomials over a field.
Proposition 7.1.1 Let F be a field. Then every ideal of F [x] is principal.
Proof Let N be a non-zero ideal of F [x] (clearly, the zero ideal is principal). Let
p(x) be a non-zero polynomial of minimal degree in N. Then every element of N
is a multiple of p(x), for if f (x) is in N, then by the Division Theorem, there exist
polynomials q(x) and r(x) in F [x] for which

f (x) = p(x)q(x) + r(x),

where deg(r(x)) < deg(p(x)). Thus r(x) = f (x)−p(x)q(x) ∈ I , and so r(x) = 0.




Let N be an ideal of R and let R/N denote the collection of all additive left
cosets of N in R:

R/N = {r + N : r ∈ R}.

One can endow R/N with the structure of a ring by defining addition and
multiplication on the left cosets. Addition on R/N is defined as

(a + N) + (b + N) = (a + b) + N, (7.1)

and multiplication is given as

(a + N) · (b + N) = ab + N (7.2)

for left cosets a + N, b + N ∈ R/N.


Remark 7.1.2 The reader should verify that these operations are well-defined on left
cosets, i.e., if x+N = a+N and y+N = b+N, then (x+N)+(y+N) = (a+b)+N
and (x + N ) · (y + N) = ab + N; see Section 7.5, Exercise 1. 

The ring R/N endowed with the binary operations (7.1) and (7.2) is the quotient
ring of R by N. The quotient ring R/N is a commutative ring with unity; the
quotient ring R/N inherits these properties from R; the 0 element in R/N is the
coset 0 + N = N , and the unity element is the coset 1 + N .
7.1 Quotient Rings and Ring Homomorphisms 119

Example 7.1.3 Let R = Z, N = (4). Then the quotient ring Z/(4) consists of the
cosets {(4), 1+(4), 2+(4), 3+(4)}, together with coset addition and multiplication.
For instance, the sum of cosets 2 + (4), 3 + (4) is defined as

(2 + (4)) + (3 + (4)) = (2 + 3) + (4) = 5 + (4),

but note that the left coset 5 + (4) is equal to the left coset 1 + (4), thus

(2 + (4)) + (3 + (4)) = 1 + (4).

Likewise, the product of cosets is given as

(2 + (4)) · (3 + (4)) = (2 · 3) + (4) = 6 + (4) = 2 + (4).

The complete binary operation tables for Z/(4) are as follows:

+ (4) 1 + (4) 2 + (4) 3 + (4)


(4) (4) 1 + (4) 2 + (4) 3 + (4)
1 + (4) 1 + (4) 2 + (4) 3 + (4) (4)
2 + (4) 2 + (4) 3 + (4) (4) 1 + (4)
3 + (4) 3 + (4) (4) 1 + (4) 2 + (4)

· (4) 1 + (4) 2 + (4) 3 + (4)


(4) (4) (4) (4) (4)
1 + (4) (4) 1 + (4) 2 + (4) 3 + (4)
2 + (4) (4) 2 + (4) (4) 2 + (4)
3 + (4) (4) 3 + (4) 2 + (4) 1 + (4)



Example 7.1.4 Let R = Q[x] and let N = (x 2 − 2) be the principal ideal of Q[x]
generated by x 2 − 2. The elements of the quotient ring Q[x]/(x 2 − 2) consist of left
cosets computed as follows. Let f (x) ∈ Q[x]. By the Division Theorem, there exist
polynomials q(x) and r(x) for which

f (x) = q(x)(x 2 − 2) + r(x),

with deg(r(x)) < deg(x 2 − 2) = 2. Thus r(x) = a + bx for some a, b ∈ Q. It


follows that the elements of Q[x]/(x 2 − 2) are {a + bx + (x 2 − 2) : a, b ∈ Q}.
Note that addition in Q[x]/N, N = (x 2 − 2), is given as

(a + bx + N) + (c + dx + N) = a + c + (b + d)x + N,
120 7 Advanced Topics in Algebra

while multiplication is given as

(a + bx + N ) · (c + dx + N) = (a + bx)(c + dx) + N
= ac + (ad + bc)x + bdx 2 + N
= ac + (ad + bc)x + 2bd − 2bd + bdx 2 + N
= ac + 2bd + (ad + bc)x + bd(x 2 − 2) + N
= ac + 2bd + (ad + bc)x + N.



Let N and M be ideals of R. Then the sum of ideals is an ideal of R defined as
N + M = {a + b : a ∈ M, b ∈ N}.
Proposition 7.1.5 Let R be a commutative ring with unity, let N be a proper ideal
of R, and let a ∈ R. Then a + N is a unit of R/N if and only if (a) + N = R.
Proof Suppose (a) + N = R. Since 1 ∈ R, there exist elements r ∈ R and n ∈ N
so that ra + n = 1, and hence ra = 1 − n. Now

(r + N )(a + N) = ra + N
= (1 − n) + N
= (1 + N) + (−n + N)
= (1 + N) + N
= 1 + N,

and thus r + N = (a + N)−1 .


Conversely, suppose a + N is a unit of R/N. Then 1 + N = ar + N for some
r ∈ R, and so, 1 ∈ ar + N. Thus R ⊆ (a) + N. Since (a) + N ⊆ R, one has
(a) + N = R. 

A maximal ideal is a proper ideal M of R for which there is no proper ideal
N with M ⊂ N ⊂ R. For instance, (3) is a maximal ideal of Z, and (4) is not a
maximal ideal of Z since (4) ⊂ (2) ⊂ Z.
Proposition 7.1.6 Let R be a commutative ring with unity. Then M is a maximal
ideal of R if and only if R/M is a field.
Proof Suppose that R/M is a field and let N be a proper ideal of R with M ⊆
N ⊂ R. If M = N, then there exists an element a ∈ N\M, and hence a + M is a
non-zero element of the field R/M. Consequently, a + M is a unit in R/M, and so,
by Proposition 7.1.5, R = (a) + M ⊆ N. Thus N = R, which is a contradiction.
7.1 Quotient Rings and Ring Homomorphisms 121

For the converse, we suppose that M is maximal. Since R is a commutative


ring with unity, so is R/M. So it remains to show that every non-zero element of
R/M is a unit. Let a + M ∈ R/M, a ∈ M. Then (a) + M is an ideal of R with
M ⊆ (a) + M ⊆ R. But M is maximal, so either (a) + M = M or (a) + M = R. In
the former case, a ∈ M, which is a contradiction. Thus (a) + M = R, which says
that a + M is a unit in R/M. 

We can use Proposition 7.1.6 to “invent" zeros of polynomials. Let F be a field
and let F [x] denote the ring of polynomials over F . Then F [x] is an integral domain
(and hence a commutative ring with unity). The units of F [x] consist of all degree
0 polynomials, i.e., polynomials of the form f (x) = c, c = 0.
A non-zero non-unit polynomial p(x) ∈ F [x] is irreducible over F if the
factorization

p(x) = f (x)g(x)

in F [x] implies that either f (x) or g(x) is a unit in F [x]. The non-zero non-unit
polynomial p(x) is reducible if it is not irreducible.
For example, x 2 −2 is an irreducible polynomial in Q[x], while x 2 −1 is reducible
over Q since

x 2 − 1 = (x + 1)(x − 1),

where neither x + 1 nor x − 1 is a unit in Q[x].


Proposition 7.1.7 Let F be any field and let q(x) be an irreducible polynomial in
F [x]. Then the principal ideal (q(x)) is maximal.
Proof Suppose there exists a proper ideal N of F [x] for which (q(x)) ⊂ N ⊂ F [x].
By Proposition 7.1.1, N = (f (x)) for some f (x) ∈ F [x]. Thus q(x) = r(x)f (x)
where neither r(x) nor f (x) is a unit of F [x], and thus q(x) is reducible. 

Proposition 7.1.8 There exists a field extension E/F that contains a zero of q(x).
Proof By Proposition 7.1.7, (q(x)) is a maximal ideal of F [x], and by Proposi-
tion 7.1.6, E = F [x]/(q(x)) is a field. We identify F with the collection of cosets
{r + (q(x)) : r ∈ F }, and in this way, F ⊆ E. Thus E is a field extension of F .
Now let α = x + (q(x)) ∈ E. Write

q(x) = an x n + an−1 x n−1 + · · · + a2 x 2 + a1 x + a0

for ai ∈ F , 1 ≤ i ≤ n, an = 0. Then

q(α) = q(x + (q(x)))


= an (x + (q(x)))n + an−1 (x + (q(x)))n−1 + · · · + a2 (x + (q(x)))2
+ a1 (x + (q(x))) + a0
122 7 Advanced Topics in Algebra

= an (x n + (q(x))) + an−1 (x n−1 + (q(x))) + · · · + a2 (x 2 + (q(x)))


+ a1 (x + (q(x))) + a0
= q(x) + (q(x))
= (q(x))

and so q(α) = 0 in E. 

We call the coset α an invented root of q(x).
Proposition 7.1.9 Let f (x) ∈ F [x] with deg(f (x)) = d ≥ 1. Then there exists a
field extension E/F so that f (x) factors into a product of linear factors in E[x].
Proof We prove this by induction on d = deg(f (x)).
The Trivial Case: d = 1 In this case f (x) = ax + b ∈ F [x], so we may take
E = F.
The Induction Step We assume that the proposition is true for polynomials of degree
d − 1. The polynomial f (x) factors into a product of irreducible polynomials

f (x) = q1 (x)q2 (x)q3 (x) · · · qk (x).

If deg(qi (x)) = 1 for i = 1, 2, . . . , k, then we can take E = F . Else, let j be the


smallest index with deg(qj (x)) > 1. By Proposition 7.1.8, there exists an invented
root α of qj (x) in some field extension L/F . Over L, f (x) factors as

f (x) = q1 (x)q2 (x) · · · qj −1 (x)(x − α)r(x)qj +1 · · · qk (x),

for r(x) ∈ L[x].


Put

g(x) = q1 (x)q2 (x) · · · qj −1 (x)r(x)qj +1 · · · qk (x).

Then deg(g(x)) = d − 1. By the induction hypothesis, there exists a field extension


E/L so that g(x) factors into a product of linear factors in E[x]. Since f (x) =
g(x)(x − α), f (x) factors into a product of linear factors in E[x].


Proposition 7.1.9 says that E/F contains d roots of f (x) which may or may not
be distinct. These are the only possible zeros of f (x) since a degree d polynomial
over a field can have at most d roots in the field (Proposition 6.1.7).
7.1 Quotient Rings and Ring Homomorphisms 123

7.1.2 Ring Homomorphisms

Definition 7.1.10 Let R and R  be commutative rings with unity elements, 1R ∈ R,


1R  ∈ R  . A function f : R → R  is a ring homomorphism if
(i) f (1R ) = 1R 
(ii) f (a + b) = f (a) + f (b)
(iii) f (ab) = f (a)f (b)
for all a, b ∈ R.
For example, let n ≥ 1 be an integer and let (n) be the principal ideal of Z
generated by n. The map f : Z → Z/(n) defined as f (a) = a + (n) is a
homomorphism of commutative rings with unity. To see this, we show that the
conditions of Definition 7.1.10 hold. Indeed, f (1) = 1 + (n), so (i) holds. For
(ii), f (a + b) = a + b + (n) = a + (n) + b + (n) = f (a) + f (b). For (iii),
f (ab) = ab + (n) = (a + (n)) · (b + (n)) = f (a)f (b).
Here is another example of a ring homomorphism.
Proposition 7.1.11 Let E/F be a field extension and let α ∈ E. Then the function
fα : F [x] → E defined by fα (p(x)) = p(α) is a ring homomorphism.
m n
Proof Let p(x) = i=0 ai x
i and q(x) = j =0 bj x
j be polynomials in F [x].
Then


m 
n
fα (p(x) + q(x)) = fα ai x i + bj x j
i=0 j =0


m 
n
= ai α i + bj α j
i=0 j =0


m 
n
= fα ai x i + fα bj x j
i=0 j =0

and


m 
n
fα (p(x)q(x)) = fα ai bj x i+j
i=0 j =0


m 
n
= ai bj α i+j
i=0 j =0


m 
n
= fα ai x i fα bj x j .
i=0 j =0
124 7 Advanced Topics in Algebra

Moreover, fα (1F [x] ) = 1F = 1E , and so fα is a homomorphism of rings with unity.




The homomorphism of Proposition 7.1.11 is called the√evaluation homomor-
phism. For an example, let F = Q, E = R, and α = √ 2. Then the evaluation
homomorphism φ√2 : Q[x] → R is given by p(x) → p( 2).
Definition 7.1.12 The kernel of the ring homomorphism f : R → R  is the subset
of R defined as ker(f ) = {a ∈ R : f (a) = 0}.
Proposition 7.1.13 The kernel of a ring homomorphism f : R → R  is an ideal of
R.
Proof Let N = ker(f ). Then N is an additive subgroup of R (proof?), so we need
to only show that aN ⊆ N for all a ∈ R. But this condition follows since φ(an) =
φ(a)φ(n) = 0 for an ∈ aN. 

Definition 7.1.14 A ring homomorphism f : R → R  is an isomorphism of rings
if f is a bijection. The rings R and R  are isomorphic if there exists an isomorphism
f : R → R  . We then write R ∼ = R.
Theorem 7.1.15 Let f : R → R  be a homomorphism of rings. Let ker(f ) = N.
Then g : R/N → f (R) defined by g(a + N) = f (a) is an isomorphism of rings.
Proof One first checks that f (R) is a ring. We then check that g is well-defined
on cosets. Let a + N = b + N. Then a = b + n for some n ∈ N. Hence f (a) =
f (b +n) = f (b)+f (n) = f (b), and so g(a +N) = g(b +N). Thus g is a function
on cosets.
Now, g(1+N ) = f (1R ) = 1R  , g(a +N +b+N) = g(a +b+N) = f (a +b) =
f (a) + f (b) = g(a + N) + g(b + N), and

g((a + N )(b + N)) = g(ab + N)


= f (ab)
= f (a)f (b)
= g(a + N)g(b + N).



For example, the surjective (onto) ring homomorphism f : Z → Zn , a →
(a mod n) induces the ring isomorphism g : Z/nZ → Zn , a + nZ → (a mod n).
Here is an important example of a ring isomorphism.
Proposition 7.1.16 Let n1 , n2 > 0 be integers with gcd(n1 , n2 ) = 1. Then there is
an isomorphism of rings Zn1 n2 → Zn1 × Zn2 .
Proof Define a map ψ : Z → Zn1 × Zn2 by the rule

ψ(a) = ((a mod n1 ), (a mod n2 )).


7.1 Quotient Rings and Ring Homomorphisms 125

Then ψ is a ring homomorphism since, for a, b ∈ Z,

ψ(a + b) = (((a + b) mod n1 ), ((a + b) mod n2 ))


= ((a mod n1 ) + (b mod n1 ), (a mod n2 ) + (b mod n2 ))
= ((a mod n1 ), (a mod n2 )) + ((b mod n1 ), (b mod n2 ))
= ψ(a) + ψ(b).

Moreover,

ψ(ab) = ψ(a)ψ(b).

Now, ker(ψ) is a cyclic subgroup of Z generated by a common multiple of n1 , n2 ,


which in fact is the least common multiple lcm(n1 , n2 ). Since gcd(n1 , n2 ) = 1,
lcm(n1 , n2 ) = n1 n2 by Proposition 5.7.9.
By Theorem 7.1.15, there is an injective ring homomorphism

g : Z/n1 n2 Z → Zn1 × Zn2

defined as g(a + n1 n2 Z) = ((a mod n1 ), (a mod n2 )).


Observe that Z/n1 n2 Z ∼= Zn1 n2 and so we can write this injective homomor-
phism as g : Zn1 n2 → Zn1 × Zn2 .
Since Zn1 n2 and Zn1 × Zn2 have the same number of elements, g is a ring
isomorphism. 

Proposition 7.1.17 Let R be a commutative ring with unity, 1R . Then the map  :
Z → R defined as (n) = n1R = 1R + 1R + · · · + 1R is a homomorphism of
  
n
commutative rings with unity.
Proof For m, n ∈ Z, (m + n) = (m + n)1R = m1R + n1R = (m) + (n),
and (mn) = mn1R = m((n)) = m(1R (n)) = (m1R )(n) = (m)(n). Also,
(1Z ) = 1Z 1R = 1R , and so,  is a homomorphism of commutative rings with
unity. 

The kernel of  : Z → R is an ideal of Z of the form rZ for some integer r ≥ 0.
The integer r is the characteristic of the ring R and is denoted as char(R).
Corollary 7.1.18 Let R be a ring with unity with r = char(R). If r = 0, then
R contains a (sub)ring isomorphic to Z. If r > 0, then R contains a subring
isomorphic to Zr .
Proof Note that (Z) is a subring of R. If r = char(R) = 0, then Z ∼
= Z/{0} is
isomorphic to (Z) by Theorem 7.1.15. If r = char(R) > 0, then Zr ∼
= Z/rZ is
isomorphic to (R) (again by Theorem 7.1.15). 

126 7 Advanced Topics in Algebra

7.2 Simple Algebraic Extensions

Theorem 7.1.15 can be applied in the following important way.


Let F be a field contained in some larger extension field E (perhaps F = Q and
E = C). Let F [x] be the ring of polynomials over F and let α be an element of E
that is a zero of some polynomial g(x) in F [x] with deg(g(x)) ≥ 1.
Let fα : F [x] → E be the evaluation homomorphism, q(x) → q(α) ∈ E. The
kernel of fα is the ideal N of F [x] consisting of all polynomials in F [x] for which
α is a zero. By Proposition 7.1.1, N is a principal ideal of the form N = (p(x)) for
some p(x) ∈ F [x]. In fact, p(x) is the monic irreducible polynomial of smallest
degree with p(α) = 0. Since p(x) is irreducible, (p(x)) is a maximal ideal of F [x].
By Proposition 7.1.6, F [x]/(p(x)) is a field, and by Theorem 7.1.15 there exists
an isomorphism

g : F [x]/(p(x)) → fα (F [x]),

defined as h(x) + (p(x)) → fα (h(x)) = h(α), for h(x) ∈ F [x].


Thus the image fα (F [x]) is a field. In the case that h(x) = c for c ∈ F , we have
g(c + (p(x)) = c, c ∈ F , and so, F is a subfield of fα (F [x]), and fα (F [x]) is a
field extension of F .
Definition 7.2.1 The field fα (F [x]), denoted as F (α), is the simple algebraic
extension of F by α. The degree of F (α) is the degree of the irreducible polynomial
p(x).
We want to give an explicit description of the field F (α).
Proposition 7.2.2 The simple algebraic field extension F (α) of F is a vector
space over F of dimension equal to the degree of F (α). An F -basis for F (α) is
{1, α, α 2 , . . . , α n−1 }.
Proof Let n = deg(p(x)). Let h(x) ∈ F [x]. By Proposition 6.1.5, we have

h(x) = p(x)q(x) + r(x)

for q(x) and r(x) with deg(r(x)) < p(x). Write

r(x) = a0 + a1 x + a2 x 2 + · · · + an−1 x n−1 ,

ai ∈ F . Then the coset h(x) + (p(x)) can be written as the coset r(x) + (p(x)).
Thus, every element of F (α) can be written in the form

r(α) = a0 + a1 α + a2 α 2 + · · · + an−1 α n−1 ,


7.2 Simple Algebraic Extensions 127

for ai ∈ F . This says that the F -span of the vectors

B = {1, α, α 2 , . . . , α n−1 }

is F (α).
Now, the set B is linearly independent. For if not, then there exist an integer m,
1 ≤ m ≤ n − 1 and elements ai , 0 ≤ i ≤ m − 1, not all zero, for which

α m = am−1 α m−1 + am−2 α m−2 + · · · + a1 α + a0 .

But this says that the kernel of fα contains a non-zero polynomial of degree < n, a
contradiction.
Thus B is a linearly independent spanning set for F (α) and consequently is an
F -basis for F (α). 

√ √
Example 7.2.3 We take F = Q, E = R, and α = 2; 2 is a zero of the
polynomial x 2 − 2 over Q. We have the evaluation homomorphism

f√2 : Q[x] → R,

whose kernel is (x 2 − 2). Thus,



Q[x]/(x 2 − 2) ∼
= fα (Q[x]) = Q( 2).
√ √
A “power”√basis for the simple algebraic field extension Q( 2) is {1, 2}; the
degree of Q( ) over Q is 2. We have
√ √
Q( 2) = {a + b 2 : a, b ∈ Q}.




Example 7.2.4 We take F = R, E = C, and α = i = −1; i is a zero of the
polynomial x 2 + 1 over R. We have the evaluation homomorphism

fi : R[x] → C,

whose kernel is (x 2 + 1). Thus,

R[x]/(x 2 + 1) ∼
= fα (R[x]) = R(i).

A power basis for the simple algebraic field extension R(i) is {1, i}; the degree
of R(i) over R is 2. We have

R(i) = {a + bi : a, b ∈ R},

which is well-known to be the field of complex numbers C. 



128 7 Advanced Topics in Algebra

7.2.1 Algebraic Closure

Let E be an extension field of F . An element α ∈ E is algebraic over F if α is a


zero of a polynomial g(x) ∈ F [x]. The extension E/F is an algebraic √ extension
of F if every element of E is algebraic over F . For instance, Q( 2) is an algebraic
extension of Q (can you prove this?)
A field F is algebraically closed if every polynomial g(x) ∈ F [x] has a zero in
F . One familiar example of an algebraically closed field is C.
An algebraic closure of F is an algebraic extension of F that is algebraically
closed. Every field F has (essentially) a unique algebraic closure which we denote
as F ; it is the largest algebraic extension of F . For example, if F = R, then R = C.

7.3 Finite Fields

A finite field is a field with a finite number of elements. If p is a prime number, then
there exists a field with exactly p elements.
Proposition 7.3.1 Let p be prime. Then Zp is a field.
Proof Certainly, Zp is a commutative ring with unity, 1. By Proposition 6.2.3, Zp
is a division ring. 

It turns out that the number of elements in any finite field is always a power of a
prime number.
Proposition 7.3.2 Let F be a field with a finite number of elements. Then |F | = pn ,
where p is a prime number and n ≥ 1 is an integer.
Proof Since F is finite, r = char(F ) > 0, and hence by Corollary 7.1.18,
F contains a subring B isomorphic to Zr . Henceforth, we identify B with Zr .
Since F is a field, r must be a prime number p, and hence F contains the field
Zp . As F is finite, it is certainly a finite-dimensional vector space over Zp , with
scalar multiplication Zp × F → F given by multiplication in F . Thus F =
Zp ⊕ Zp ⊕ · · · ⊕ Zp , where n = dim(F ), and hence |F | = pn . 

  
n
In Theorem 6.3.1, we showed that U (Zp ) is cyclic. This can be extended to finite
fields.
Proposition 7.3.3 Let F be a field with a finite number of elements. Then the
multiplicative group of units of F , U (F ) = F × , is cyclic.
Proof Since F is a commutative ring with unity, U (F ) is a finite abelian group
whose identity element is 1. Let f be the exponent of U (F ). We have f ≤ |U (F )| =
pn − 1, for some n ≥ 1. Consider the polynomial x f − 1 in F [x]. By the definition
7.3 Finite Fields 129

of exponent, g f = 1 for all g ∈ U (F ) and so x f − 1 has pn − 1 distinct zeros in F .


Thus f = pn − 1 by Proposition 6.1.7.
By Proposition 5.7.13, there exists an element g ∈ U (F ) which has order pn −1.
But then |g| = pn − 1 so that g = U (F ). Thus U (F ) is cyclic. 

Proposition 7.3.4 Let F be a field with a finite number of elements. Then F is
isomorphic to a simple algebraic extension of Zp for some prime number p, and
|F | = pn , where n is the degree of the simple algebraic extension of Zp .
Proof By Proposition 7.3.2, |F | = pn for some prime p and integer n ≥ 1; F
contains the subfield Zp . By Proposition 7.3.3, the group of units of F , F × , is
cyclic of order pn − 1 generated by α.
Let fα : Zp [x] → F denote the evaluation homomorphism with ker(fα ) =
(q(x)), where q(x) is monic, irreducible of degree m in Zp [x], with root α. Then
Zp [x]/(q(x)) ∼ = Zp (α) is a simple algebraic extension of Zp of degree m over Zp .
Let g : Zp [x]/(q(x)) → F be defined by r(x) + (q(x)) → r(α). Since r(x) −
s(x) ∈ (q(x)) implies that r(α) = s(α), g is well-defined on cosets. Clearly, g is
onto, and since Zp [x]/(q(x)) is a field, g is 1–1. Hence g is an isomorphism of
fields. The set {1, x, x 2 , . . . , x m−1 } is a Zp -basis for Zp [x]/(q(x)) and so |F | =
pm . But we already know that |F | = pn , and hence m = n. 

Let p be a prime and let n ≥ 1. In what follows, we address the existence of finite
fields of order pn ; we show how to explicitly construct a finite field with exactly pn
elements.
In the case n = 1, there exists a field with exactly p1 = p elements, namely, Zp .
For n ≥ 1, let
n
f (x) = x p − x ∈ Zp [x].

By Proposition 7.1.9, there exists a field extension E/Zp that contains all pn zeros
of f (x) (counting multiplicities).
n
Proposition 7.3.5 The zeros of f (x) = x p − x in E are distinct.
Proof Let F = {αi }, 1 ≤ i ≤ pn , be the set of roots of f (x), and suppose that
some root αi has multiplicity ≥ 2. Then f  (αi ) = 0. But this is impossible since the
formal derivative f  (x) = −1 in Zp [x]. 

n
Proposition 7.3.6 Let F = {αi }, 1 ≤ i ≤ pn , be the set of roots of f (x) = x p −x.
Then F is a field, with operations induced from E.
Proof By Corollary 6.2.7, the elements of Zp are roots of f (x). Thus Zp ⊆ F and
n pn pn
char(F ) = p. Let αi , αj ∈ F . Then (αi + αj )p = αi + αj = αi + αj (use the
binomial theorem and Corollary 6.2.7). Thus F is closed under addition. Moreover,
n n pn n
(−αi )p = (−1)p αi = −αi , since (−1)p ≡ −1 mod p by Corollary 6.2.7.
n pn pn
Hence F is an additive subgroup of E. Also, (αi αj )p = αi αj = αi αj , so that F
130 7 Advanced Topics in Algebra

is closed under multiplication. Thus F is a commutative ring with unity. For αi ∈ F


non-zero, αi−1 ∈ E. But also, (αi−1 )p = αi−1 . Thus F is a field.
n


By Proposition 7.3.6, there is a field F of order pn , consisting of the pn distinct
n
roots of x p − x. The following proposition shows that there is essentially only one
finite field of order pn .
Proposition 7.3.7 Let F1 and F2 be finite fields of order pn . Then F1 ∼ = F2 .
Proof We show that F1 = ∼ F , where F is the field constructed in Proposition 7.3.6
n
consisting of the roots of x p −x ∈ Zp [x]. By Proposition 7.3.4, F1 is isomorphic to
a simple algebraic extension of Zp , Zp [x]/(q(x)) ∼ = Zp (α), where deg(q(x)) = n,
n −1 n
and α is a root of both q(x) and x(x p − 1) = x p − x. Hence α ∈ F . Define
f : Zp [x]/(q(x)) → F by the rule r(x) + (q(x)) → r(α). Then f is well-defined
on cosets (f is a function). Moreover, f is 1–1 since Zp [x]/(q(x)) is a field and
onto since both Zp [x]/(q(x)) and F have the same number of elements. Thus f
is an isomorphism. It follows that F1 ∼ = F . Similarly, F2 ∼= F , which proves the
result. 

The unique (up to isomorphism) finite field of order pn is called the Galois field
of order pn and is denoted by GF(pn ), or more simply, by Fpn . For a prime number
p, GF(p) = Fp = Zp .
We next discuss polynomials over the Galois field Fpn .
Proposition 7.3.8 Let f (x) ∈ Fpn [x] with deg(f (x)) = k ≥ 1. Assume that
f (0) = 0. Then there exists an integer t, 1 ≤ t ≤ pnk − 1, for which f (x) | x t − 1.
Proof First note that the quotient ring Fpn [x]/(f (x)) contains pnk − 1 left cosets
other than (f (x)). The collection {x i + (f (x)) : i = 0, 1, 2, . . . , pnk − 1} is a set
of pnk left cosets not containing (f (x)). Thus

x i + (f (x)) = x j + (f (x)),

for some i, j , 0 ≤ i < j ≤ pnk − 1. Since f (0) = 0, there exist polynomials


r(x), s(x) ∈ Fpn [x] so that

xr(x) + f (x)s(x) = 1.

Consequently, x i + (f (x)) is a unit in Fpn [x]/(f (x)). Set t = j − i. It follows that


x t ∈ 1 + (f (x)), and hence f (x) | x t − 1 with 1 ≤ t ≤ pnk − 1. 

The smallest positive integer e for which f (x) | (x e − 1) is the order of f (x),
denoted as ord(f (x)). For example, over F2 , ord(x 4 + x + 1) = 15 and ord(x 4 +
x 3 + x 2 + x + 1) = 5. Over F9 = F3 (α), α 2 + 1 = 0, ord(x − α) = 4 since

x 4 − 1 = (x − 1)(x + 1)(x − α)(x + α)

over F9 .
7.3 Finite Fields 131

Proposition 7.3.9 Let f (x) be an irreducible polynomial in Fpn [x] of degree k.


nm
Then f (x) divides x p − x if and only if k | m.
nm
Proof Suppose that f (x) divides x p − x. By Proposition 7.1.8, there exists a
nm
field extension E/Fpn that contains a zero α of f (x). Thus α p − α = 0, and so,
n
α ∈ Fpnm . Moreover, the elements of Fpn are precisely the zeros of x p − x and any
n nm
zero of x p − x is also a zero of x p − x. Thus, F = Fpn (α) is a subfield of Fpnm .
nk
Note that F has p elements.
Now let β be a generator of the cyclic group F× pnm and let q(x) be the irreducible
polynomial of β over F . Let t = deg(q(x)). Then F (β) = Fpnm . Now F (β) has
pnkt elements and so pnkt = pnm . Hence kt = m, that is, k | m.
For the converse, suppose that k | m. Let α be a zero of f (x) in some extension
field E/Fpn . Then Fpn (α) is a field with pnk elements, and α satisfies the relation
nk nks nm
α p −α = 0. Let s be so that ks = m. Then α p −α = 0, and hence, α p −α = 0.
nm nm
Consequently, α is a root of x p −x. It follows that x p −x is in (f (x)), the kernel
nm
of the evaluation homomorphism φα : Fpn [x] → E, and so f (x) | x p − x. 

Proposition 7.3.10 Let f (x) be an irreducible polynomial in Fpn [x] of degree k.
Let α be a zero of f (x) in an extension field E/Fpn . Then the zeros of f (x) are of
n 2n (k−1)n
the form α, α p , α p , . . . , α p .
Proof We already know that α ∈ E is one zero of f (x). Since the characteristic of
Fpn is p,

jn jn
f (α p ) = f (α)p = 0,

for 1 ≤ j ≤ k − 1. 

Proposition 7.3.11 Let f (x) be an irreducible polynomial in Fpn [x] of degree k.
Let α be a zero of f (x) in an extension field E/Fpn . The smallest field extension
containing all of the zeros of f (x) is Fpn (α), which is isomorphic to the Galois field
Fpnk .
Proof By Proposition 7.3.10, Fpn (α) is the smallest field containing all of the zeros
of f (x). We have |Fpn (α)| = pnk , and thus by Proposition 7.3.7, Fpn (α) ∼= Fpnk .


Let f (x) be an irreducible polynomial in Fpn [x] of degree k ≥ 1, and let α be
a zero of f (x). As we have seen in Proposition 7.3.11, Fpn (α) contains all of the
roots of f (x). The following proposition computes ord(f (x)).
Proposition 7.3.12 Let f (x) be an irreducible polynomial in Fpn [x] of degree k ≥
1, and let α be a zero of f (x) in an extension field E. Then ord(f (x)) equals the
order of any root of f (x) in the group of units of Fpn (α).
132 7 Advanced Topics in Algebra

Proof By Proposition 7.3.3, Fpn (α)× is cyclic of order pnk − 1, generated by some
element β. Put α = β l for some integer l. Now a typical zero of f (x) can be written
mn
β lp for 0 ≤ m ≤ k − 1. By Proposition 5.7.10,

mn pnk − 1
|β lp | = .
gcd(pnk − 1, lpmn )

Since gcd(pnk − 1, pmn ) = 1, the right-hand side above only depends on l, and so
each zero of f (x) has the same order.
Let e be the order of α in Fpn (α)× (the smallest positive integer e so that α e = 1).
Then α is a zero of x e − 1. Thus x e − 1 ∈ (f (x)) since (f (x)) is the kernel of the
evaluation homomorphism

φα : Fpn [x] → E.

Thus f (x) | x e − 1. It follows that e = order(f (x)). 



An irreducible polynomial f (x) of degree k in Fpn [x] is primitive if
order(f (x)) = pnk − 1. Equivalently, an irreducible, degree k polynomial f (x) is
primitive if there exists a root α of f (x) for which α = Fpn (α)× . For example, in
F2 , every irreducible polynomial of degree k = 3, 5, 7 is primitive since the group
of units F×2k
has prime order for k = 3, 5, 7.
We give some examples of Galois fields of order pn and primitive and non-
primitive polynomials.
Example 7.3.13 Let p = 5 and n = 1. The Galois field F5 = Z5 contains all of the
distinct roots of x 5 − x ∈ F5 [x] by Corollary 6.2.7. Indeed, over F5 ,

x 5 − x = x(x − 1)(x − 2)(x − 3)(x − 4).

Observe that 2 has order 4 in U (F5 ) = F× ×


5 and 4 has order 2 in F5 . Hence order(x −
2) = 4 and order(x − 4) = 2; x − 2 is a primitive polynomial over F5 . 

Example 7.3.14 Let p = 3 and n = 2. Then F9 consists of all of the roots of x 9 −x.
In F3 [x], x 9 − x factors into irreducible elements as

x 9 − x = x(x − 1)(x + 1)(x 2 + 1)(x 2 − x − 1)(x 2 + x − 1).

We take α to be a root of x 2 + 1 (the other root is α 3 = 2α). Thus

= F3 [x]/(x 2 + 1) ∼
F9 ∼ = F3 (α).

An F3 -basis for F3 (α) is {1, α}. The 9 elements of F9 are thus

0 = 0 · 1 + 0 · α,
α = 0 · 1 + 1 · α,
7.3 Finite Fields 133

2α = 0 · 1 + 2 · α,
1 = 1 · 1 + 0 · α,
1 + α = 1 · 1 + 1 · α,
1 + 2α = 1 · 1 + 2 · α,
2 = 2 · 1 + 0 · α,
2 + α = 2 · 1 + 1 · α,
2 + 2α = 2 · 1 + 2 · α.

Both α and α 3 have order 4 in F×


9 , and so order(x +1) = 4. Thus x +1 is a non-
2 2

primitive polynomial over F3 . On the other hand, the root β of x − x − 1 ∈ F3 [x]


2

has order 8 in F×
9 , and so x − x − 1 is a primitive polynomial over F3 .
2 

Example 7.3.15 In this example, we take F9 = F3 (α), α 2 +1 = 0, as our base field.
Let f (x) = x 2 + x + β ∈ F9 [x] with β as in Example 7.3.14. Then one checks
directly that f (x) is irreducible over F9 . By Proposition 7.3.10, the (distinct) roots
of f (x) are γ , γ 9 ; F9 (γ ) = F81 . We have

(x − γ )(x − γ 9 ) = x 2 + x + β,

so that γ 10 = β. Since β has order 8 in F9 , γ has order 80 in F×


81 . Thus f (x) is
primitive over F9 . 

The finite fields F2n , n ≥ 1, are of use in computer science since their base field
F2 = {0, 1} represents the collection of binary digits (bits).
Example 7.3.16 Consider F16 . The polynomial x 16 − x ∈ F2 [x] factors into
irreducibles as

x(x + 1)(x 2 + x + 1)(x 4 + x + 1)(x 4 + x 3 + 1)(x 4 + x 3 + x 2 + x + 1). (7.3)

So F16 can be constructed as F2 [x]/(x 4 + x + 1) = F2 (α), where α is a root of


x 4 + x + 1. An F2 -basis for F16 is {1, α, α 2 , α 3 } and so the 16 elements are

0 = 0 · 1 + 0 · α + 0 · α2 + 0 · α3,
α3 = 0 · 1 + 0 · α + 0 · α2 + 1 · α3,
α2 = 0 · 1 + 0 · α + 1 · α2 + 0 · α3,
α2 + α3 = 0 · 1 + 0 · α + 1 · α2 + 1 · α3,
α = 0 · 1 + 1 · α + 0 · α2 + 0 · α3,
α + α3 = 0 · 1 + 1 · α + 0 · α2 + 1 · α3,
α + α2 = 0 · 1 + 1 · α + 1 · α2 + 0 · α3,
134 7 Advanced Topics in Algebra

α + α2 + α3 = 0 · 1 + 1 · α + 1 · α2 + 1 · α3,
1 = 1 · 1 + 0 · α + 0 · α2 + 0 · α3,
1 + α3 = 1 · 1 + 0 · α + 0 · α2 + 1 · α3,
1 + α2 = 1 · 1 + 0 · α + 1 · α2 + 0 · α3,
1 + α2 + α3 = 1 · 1 + 0 · α + 1 · α2 + 1 · α3,
1 + α = 1 · 1 + 1 · α + 0 · α2 + 0 · α3,
1 + α + α3 = 1 · 1 + 1 · α + 0 · α2 + 1 · α3,
1 + α + α2 = 1 · 1 + 1 · α + 1 · α2 + 0 · α3,
1 + α + α2 + α3 = 1 · 1 + 1 · α + 1 · α2 + 1 · α3.

In fact, f (x) = x 4 + x + 1 is a primitive polynomial over F2 : from the


factorization (7.3), f (x) is irreducible. By Proposition 7.3.12, order(f (x)) = 3
or 5 or 15. But clearly, f (x)  x 3 − 1 and f (x)  x 5 − 1, and thus order(f (x)) = 15
which says that f (x) is primitive. 

The Galois field F16 is the field consisting of all possible half-bytes (strings of
0’s and 1’s of length 4). The addition is given by bit-wise addition modulo 2, and
the multiplication is induced by the relation α 4 = 1 + α. For example,

0110 + 1100 = 1010,

and

0110 · 1001 = 1100,

since (α + α 2 )(1 + α 3 ) = 1 + α.

7.4 Invertible Matrices over Zpq

We close the chapter with some material needed in the construction of the Hill cipher
(Section 8.3).
Let p, q > 0 be distinct primes and let Zpq be the ring of residues. As shown
in Section 6.1, the set of n × n matrices Matn (Zpq ) is a ring with unity under
ordinary matrix addition and multiplication. The unity in Matn (Zpq ) is the n × n
identity matrix In . The group of units of Matn (Zpq ), U (Matn (Zpq )), is the group of
invertible n × n matrices GLn (Zpq ).
In this section, we compute the number of elements in GLn (Zpq ).
7.4 Invertible Matrices over Zpq 135

Lemma 7.4.1 Let p be prime and let GLn (Zp ) denote the (group of) invertible
n × n matrices over Zp . Then

n  
2 1
|GLn (Zp )| = pn 1− .
pi
i=1

Proof We view the matrix ring Matn (Zp ) as a Zp -vector space W of dimension n2 .
We construct an invertible matrix column by column and count the possibilities
for each column. Since Zp is a field, the first column is an arbitrary non-zero vector
over Zp . This yields pn − 1 choices for the first column. The first column spans a
one-dimensional subspace W1 of W containing p elements.
The second column must be chosen so that the first and second columns are
linearly independent and thus span a two-dimensional subspace W2 ⊆ W containing
p2 elements. The second column must be chosen from W \W1 . Hence there are
pn − p choices for the second column. Continuing in this manner, we see that there
are pn − p2 choices for the third column, and so on. It follows that

n−1 n  
2 1
|GLn (Zp )| = (pn − pi ) = pn 1− .
pi
i=0 i=1



Example 7.4.2 Let n = 1. Then we may identify GL1 (Zp ) with U (Zp ). The
formula yields

|GL1 (Zp )| = p − 1

as expected. 

Proposition 7.4.3 Let p and q be distinct primes. Then
n   
2 1 1
|GLn (Zpq )| = (pq)n 1− i 1− i .
p q
i=1

Proof By Proposition 7.1.16, Zpq ∼


= Zp × Zq , as rings. Thus there is a bijection

Matn (Zpq ) → Matn (Zp ) × Matn (Zq ).

It follows that

|Matn (Zpq )| = |Matn (Zp ) × Matn (Zq )|


n   n  
2 1 2 1
= pn 1 − i qn 1− i
p q
i=1 i=1
136 7 Advanced Topics in Algebra

n   
2 1 1
= (pq)n 1− i 1− i
p q
i=1

by Lemma 7.4.1. 

Example 7.4.4 Let n = 2 and p = 2, q = 13. Then the formula yields

2   
1 1
|GL2 (Z26 )| = (26) 4
1− i 1 − i = 157, 248.
2 13
i=1



So there are 157, 248 elements
 in the group of units of the matrix ring Mat2 (Z26 ).
10 7
One of these units is A = ; indeed, using the familiar formula
1 5
  
ab d −b
= (ad − bc)I2 ,
cd −c a

one obtains
 
11 21
A−1 = .
3 22

We can also compute A−1 using GAP:

gap> A:=[[10,7],[1,5]];
[ [ 10, 7 ], [ 1, 5 ] ]
gap> Inverse(A) mod 26;
[ [ 11, 21 ], [ 3, 22 ] ].

7.5 Exercises

1. As in Remark 7.1.2, verify that the coset operations on the quotient ring R/N
are well-defined on left cosets, i.e., if x + N = a + N and y + N = b + N,
then (x + N ) + (y + N) = (a + b) + N and (x + N) · (y + N) = ab + N.
2. Let R be a commutative ring with unity, and let N be an ideal of R. Show that
the map f : R → R/N defined as a → a + N is a surjective homomorphism
of rings.
3. Let ψ : R → R  be a homomorphism of commutative rings with unity.
Let U (R) and U (R  ) denote the groups of units, respectively. Prove that ψ
restricted to U (R) determines a homomorphism of groups ψ : U (R) → U (R  ).
7.5 Exercises 137

4. Let ψ : Z/5Z → Z5 be defined as a + 5Z → (a mod 5). Show that ψ is an


isomorphism of rings.
5. Let R1 and R2 be rings and let R1 ×R2 denote the direct product of rings. Prove
that

Matn (R1 × R2 ) ∼
= Matn (R1 ) × Matn (R2 )

as rings.

6. Let Q( 2) denote the simple algebraic field extension of Q.
√ √ √
(a) Compute (2 + √2)(5 − 2 2) in Q( 2).
(b) Compute (1 + 2)−1 .
7. Let F8 denote the finite field of 23 = 8 elements.
(a) Factor the polynomial x 8 − x into a product of irreducible polynomials over
F2 .
(b) Using invented roots, write F8 as a simple algebraic extension of F2 .
(c) Using part (b), write each element of F8 as a sequence of 3 bits.
(d) Using parts (b) and (c), compute 011 · 101 in F8 .
8. Let F9 and F81 be the Galois fields with 9 and 81 elements, respectively. Find an
irreducible polynomial f (x) ∈ F9 [x] and a root β of f (x) so that F81 = F9 (β).
9. Let n ≥ 1 be an integer and let Fp denote the Galois field with p elements.
Prove that there exists an irreducible polynomial of degree n over Fp .
10. Prove that f (x) = x 4 + x 2 + 1 is not a primitive polynomial over F2 .
11. Determine whether f (x) = x 3 + 2x 2 + 1 is a primitive polynomial over F3 .
12. Find the order of the units group U (Mat3 (Z10 )) = GL3 (Z10 ).
13. Find the order of the units group U (Matn (Fm p )) = GLn (Fpm ).
Chapter 8
Symmetric Key Cryptography

In general terms, a cryptosystem is a system of the form

M, C, e, d, Ke , Kd 

where M is the message space, C is the space of all possible cryptograms, e is the
encryption transformation, d is the decryption transformation, Ke is the encryption
keyspace, and Kd is the decryption keyspace.
The encryption transformation is a function

e : M × Ke → C.

For message M ∈ M and encryption key ke ∈ Ke ,

e(M, ke ) = C ∈ C.

The decryption transformation is a function

d : C × Kd → M.

For ke ∈ Ke and M ∈ M, there is a decryption key kd ∈ Kd so that

d(e(M, ke ), kd ) = M. (8.1)

In other words, the composition d(e(x, ke ), kd ) is the identity function on M.


If k = ke = kd is a secret key shared by Alice and Bob, then the cryptosystem is
a symmetric (key) cryptosystem. In a symmetric cryptosystem, K = Ke = Kd , and
we will often write a symmetric cryptosystem as

M, C, e, d, K.

© Springer Nature Switzerland AG 2022 139


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_8
140 8 Symmetric Key Cryptography

This chapter concerns the setup, use, and cryptanalysis of the major symmetric
cryptosystems.

8.1 Simple Substitution Cryptosystems

Definition 8.1.1 (Simple Substitution Cryptosystem) Let

= {0, 1, 2, 3, . . . , n − 1}

denote the alphabet of n letters. A message M ∈ M is a finite sequence of letters:

M = M0 M1 M2 · · · Mr−1 , Mi ∈ .

The encryption transformation e is a permutation σk in Sn , ◦, the symmetric group


on n letters. The keyspace is the set of integers K = {1, 2, 3, . . . , n!}, which is
considered as a set of indices for the n! permutations in Sn . The encryption key
ke ∈ K indicates which permutation to use for encryption.
Alice encrypts the plaintext message M = M0 M1 M2 · · · Mr−1 by computing

C = e(M, ke ) = C0 C1 C2 · · · Cr−1 ,

where Ci = σke (Mi ) for 0 ≤ i ≤ r − 1.


Bob then decrypts the ciphertext C = C0 C1 C2 · · · Cr−1 by first computing the
inverse σk−1
e
of the permutation σke in the group Sn . He then computes

M = σk−1
e
(C0 )σk−1
e
(C1 )σk−1
e
(C2 ) · · · σk−1
e
(Cr−1 ).

Note that Bob’s decrypting task is different from Alice’s encrypting task: given
k and a permutation σk , Alice computes Ci = σk (Mi ), while Bob first must find the
inverse σk−1 , and then compute Mi = σk−1 (Ci ). Both Alice and Bob use the same
permutation σk (though in different ways), and so the shared key for encryption and
decryption is k = ke = kd . This is analogous to the shared key k in the right shift
cipher from Chapter 1: The same key k is used differently, i.e., Alice shifts right k
places, while Bob shifts left k places.
The cryptosystem M, C, e, d, K described above is the simple substitution
cryptosystem. 

We show that the simple substitution cryptosystem “works," i.e., condition (8.1)
holds. To this end, let M = M0 M1 M2 · · · Mr−1 be a message in M. Then

d(e(M, k), k) = d(σk (M0 )σk (M1 )σk (M2 ) · · · σk (Mr−1 ), k)


= σk−1 (σk (M0 ))σk−1 (σk (M1 ))σk−1 (σk (M2 ))
· · · σk−1 (σk (Mr−1 ))
8.1 Simple Substitution Cryptosystems 141

= M0 M1 M2 · · · Mr−1
= M.

Example 8.1.2 In this case, = {0, 1, 2, 3, . . . , 25} is the set of 26 letters. The
message space consists of finite sequences of letters in that correspond to plaintext
English messages upon encoding the ordinary letters as below:

A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.

The encryption transformation is a permutation σk in the symmetric group S26 ; k is


the shared
 secret key. For instance, suppose that
0 1 2 3 4 5 6 7 8 9 10 11 12 13
σk =
4 14 18 8 13 1 24 19 2 25 15 9 3 20

14 15 16 17 18 19 20 21 22 23 24 25
.
23 5 11 17 12 6 0 21 16 7 10 22
Then the encryption of

C E L L P H O N E ↔ 2 4 11 11 15 7 14 13 4

is

C = e(2 4 11 11 15 7 14 13 4, k)
= σk (2) σk (4) σk (11) σk (11) σk (15) σk (7) σk (14) σk (13) σk (4)
= 18 13 9 9 5 19 23 20 13 ↔ S N J J F T X U N.



Example 8.1.3 Suppose = {0, 1, 2}, and let M = C = ∗ = {0, 1, 2}∗ ,
where ∗ denotes the set of all words of finite length over . The encryption
transformation e is a permutation in S3 , the symmetric group on 3 letters. From
Section 5.3.1, we have

S3 = {σ1 , σ2 , σ3 , σ4 , σ5 , σ6 }.

Let M = 221010
 be a message in M. To compute C = e(M, 4), we recall that
012
σ4 = . Thus
021

C = e(221010, 4)
= σ4 (2)σ4 (2)σ4 (1)σ4 (0)σ4 (1)σ4 (0)
= 112020.
142 8 Symmetric Key Cryptography

 
012
Also, σ4−1 = , and so the decryption of C = 212 is
021

d(212, 4) = σ4−1 (2)σ4−1 (1)σ4−1 (2) = 121.




8.1.1 Unicity Distance of the Simple Substitution Cryptosystem

As discussed in Section 3.4, the unicity distance of a symmetric cryptosystem is a


lower bound for the smallest positive integer n0 for which

Spur(n0 ) = Pr(Cn0 = C)(|W (C)| − 1) = 0.
C∈Ln0 ∩C

As such, it is a theoretical measure of the cryptosystem’s ability to withstand a


brute-force attack by key trial.
From the formula (3.1) given in Section 3.4, the unicity distance of the simple
substitution cryptosystem (with n = 26) is

log2 (|K|) bits


u.d. =
redundancy rate of English bits/char
log2 (|K|) bits
=
3.2 bits/char
log2 (26!)
=
3.2
88.38
=
3.2
≈ 27.61 characters.

Thus, for the simple substitution cryptosystem, n0 ≥ 28; though the exact value
for n0 might be significantly larger than 28. If Malice captures 28 characters of
ciphertext C and performs a brute-force key trial, there still may be spurious keys
for C. There is certainly some ciphertext of length < 28 that will have at least one
spurious key.
8.2 The Affine Cipher 143

8.2 The Affine Cipher

Definition 8.2.1 (Affine Cipher) Let = {0, 1, 2, 3, . . . , n − 1} denote the set of


n letters. A message M ∈ M is a finite sequence of letters:

M = M0 M1 M2 · · · Mr−1 , Mi ∈ .

Let a be an integer 1 ≤ a ≤ n − 1 that satisfies gcd(n, a) = 1, and let b be any


integer in {0, 1, 2, . . . , n − 1}. The message M = M0 M1 M2 · · · Mr−1 is encrypted
letter by letter to yield the ciphertext C = C0 C1 C2 · · · Cr−1 where

Ci = e(Mi , (a, b)) = ((aMi + b) mod n).

The encryption key is the pair of integers k = (a, b).


By Proposition 6.2.2, the condition gcd(n, a) = 1 implies that a is a unit in Zn ,
that is, there exists an element a −1 of Zn so that

aa −1 = 1 = a −1 a.

The ciphertext C = C0 C1 C2 · · · Cr−1 is decrypted letter by letter to yield the


message M = M0 M1 M2 · · · Mr−1 with

Mi = d(Ci , (a, b)) = (a −1 (Ci − b) mod n).

The symmetric cryptosystem described above is the affine cipher. 



We check to see that the affine cipher works: Let M = M0 M1 M2 · · · Mr−1 be a
message. Then (8.1) holds since

d(e(Mi , k), k) = d(((aMi + b) mod n), k)


= (a −1 (((aMi + b) mod n) − b)) mod n)
= (a −1 (aMi ) mod n)
= (Mi mod n)
= Mi ,

for i = 0, 1, . . . , r − 1.
Here is an example of an affine cipher.
Example 8.2.2 Let = {0, 1, 2, 3, . . . , 25} be the set of 26 letters. The message
space consists of finite sequences of letters in . Since gcd(26, 5) = 1, we can
choose the encryption key to be k = (a, b) = (5, 18). Now

e(2 0 19, (5, 18)) = 2 18 9,


144 8 Symmetric Key Cryptography

since C0 = ((5 · 2 + 18) mod 26) = 2, C1 = ((5 · 0 + 18) mod 26) = 18, and
C2 = ((5 · 19 + 18) mod 26) = 9.
To decrypt the ciphertext C = 7 10 22, we note that decrypting with the key
(5, 18) is the same as encrypting with the key (−5, 12) since

−5(5x + 18) + 12 = −25x − 78 ≡ x (mod 26).

Thus M0 = e(7, (−5, 12)) = ((−35 + 12) mod 26) = 3, M1 = e(10, (−5, 12)) =
(−50 + 12) mod 26) = 14, and M2 = e(22, (−5, 12)) = (−110 + 12) mod 26) = 6.
Thus

3 14 6 = d(7 10 22, (5, 18)).



In the affine cipher, if we take a = 1, b ∈ Zn , then we obtain the shift cipher, in
which encryption is

Ci = e(Mi , (1, b)) = ((Mi + b) mod n),

and decryption is

Mi = d(Ci , (1, b)) = ((Ci − b) mod n).

The following example of a shift cipher should look familiar—it is the right shift
cryptosystem given in Section 1.1.
Example 8.2.3 Let = {0, 1, 2, 3, . . . , 25} be the set of 26 letters. The message
space consists of finite sequences of letters in that correspond to plaintext English
messages upon encoding the ordinary letters as below:

A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.

Now with the key chosen to be k = 21, the encryption of

M = G O O D W Y N H A L L ↔ 6 14 14 3 22 24 13 7 0 11 11

is

C = e(6 14 14 3 22 24 13 7 0 11 11, 21)


= 1 9 9 24 17 19 8 2 21 6 6
↔BJJYRTICVGG

since C0 = ((6 + 21) mod 26) = 1, C1 = ((14 + 21) mod 26) = 9, C2 = ((14 +
21) mod 26) = 9, and so on. 

8.3 The Hill 2 × 2 Cipher 145

Both the affine cipher and the shift cipher are special cases of the simple
substitution cryptosystem.
Proposition 8.2.4 The shift cipher and the affine cipher are simple substitution
cryptosystems.
Proof The encryption and decryption transformations for both the shift and affine
ciphers are bijective maps → , and hence are given by permutations in Sn .  

8.2.1 Unicity Distance of the Affine Cipher

We take n = 26. We first compute the size of the keyspace for the affine cipher. We
have

K = {(a, b) : gcd(a, 26) = 1, b ∈ Z26 }.

Thus

|K| = φ(26) · 26 = 12 · 26 = 312.

Consequently, the unicity distance is

log2 (312)
= 2.589 char.
3.2

8.3 The Hill 2 × 2 Cipher

We generalize the affine cipher to form the Hill 2 × 2 cipher.


Definition 8.3.1 (Hill 2 × 2 Cipher) Take to be the alphabet of 2-grams

= L2 = {AA, AB, AC, . . . , ZX, ZY, ZZ}.

These 2-grams are known as blocks since they consist of blocks of letters from the
standard alphabet {A, . . . , Z}.
We encode each block of as a 2-tuple of integers from 0 to 25, hence,

= {0 0, 0 1, 0 2, . . . , 25 23, 25 24, 25 25}.

A message M ∈ M is a finite sequence of blocks in :

M = M0 M1 M2 · · · Mr−1 , Mi ∈ .
146 8 Symmetric Key Cryptography

We consider each block Mi as a 1 × 2 matrix (mi,1 mi,2 ), where mi,1 , mi,2 ∈ Z26 .
Let A be an invertible 2 × 2 matrix with entries in Z26 . These are precisely the
2 × 2 matrices of the form
 
ab
A=
cd

where a, b, c, d ∈ Z26 , ad − bc ∈ U (Z26 ).


Let B T denote the transpose of any matrix B.
The message M = M0 M1 M2 · · · Mr−1 is encrypted block by block to yield the
ciphertext C = C0 C1 C2 · · · Cr−1 where

Ci = e(Mi , A)
= (AMiT )T
   T
ab mi,1
=
cd mi,2
 T
c
= i,1
ci,2

= ci,1 ci,2

with

ci,1 = ((ami,1 + bmi,2 ) mod 26),

ci,2 = ((cmi,1 + dmi,2 ) mod 26).

The encryption key is the matrix A.


Since A is an invertible matrix in Mat2 (Z26 ), there is a unique matrix A−1 with

AA−1 = I2 = A−1 A,
 
10
where I2 is the 2 × 2 identity matrix .
01
The ciphertext C = C0 C1 C2 · · · Cr−1 is decrypted block by block to yield the
message M = M0 M1 M2 · · · Mr−1 with

Mi = d(Ci , A) = (A−1 CiT )T ,

modulo 26. 

The symmetric cryptosystem described above is the Hill 2 × 2 cipher, named
after Lester S. Hill (1891–1961).
8.3 The Hill 2 × 2 Cipher 147

We leave it as an exercise to show that the Hill 2 × 2 cipher works.


The Hill 2 × 2 cipher is a generalization of an affine cipher with key (a, 0): An
invertible element a ∈ Z26 can be viewed as an invertible 1×1 matrix in Mat1 (Z26 ).
The Hill 2 × 2 cipher is an example of a block cipher. Instead of encrypting
a message letter by letter, it encrypts blocks of 2 letters at a time. We will discuss
block ciphers over the alphabet of bits in Section 8.7.
Here is an example of a Hill 2 × 2 cipher.
Example 8.3.2 Let

M = Eyes of the World.

We capitalize and consider M as seven blocks of 2-grams, hence

M = EY ES OF TH EW OR LD.

Encoding as blocks of 2-grams over Z26 gives

M = 4 24 4 18 14 5 19 7 4 22 14 17 11 3.
 
10 7
With key A = ∈ GL2 (Z26 ), encryption is given as
1 5

   T
10 7 4 
C1 = e(M1 , A) = = 0 20 ,
1 5 24

   T
10 7 4 
C2 = e(M1 , A) = = 10 16 ,
1 5 18

..
.
   T
10 7 11 
C7 = e(M1 , A) = = 10 .
1 5 3

So that the ciphertext is

C = 0 20 10 16 19 13 5 2 12 10 25 21 1 0,

which decodes as

C = AU KQ TN FC MK ZV BA.



148 8 Symmetric Key Cryptography

8.3.1 Unicity Distance of the Hill 2 × 2 Cipher

In Section 3.2, we computed that the entropy rate of plaintext English over the
ordinary alphabet L1 = {A, B, . . . , Z}. We obtained:

Hn
H∞ = lim ≈ 1.5 bits/char,
n→∞ n
where Hn denotes the entropy of the n-gram relative frequency distribution. The
redundancy rate of plaintext is thus 3.2 bits/char.
To compute the unicity distance of the Hill 2×2 cipher, we first need to reconsider
the redundancy rate per character, given that plaintext messages are now written as
words over the alphabet of 2-grams,

= L2 = {AA, AB, AC, . . . , ZX, ZY, ZZ}.

Thus, a single character (or letter) is now a 2-gram block.


We need to consider the sequence of entropy rates:

{H2n /n} = H2 /1, H4 /2, H6 /3, . . .

The limit
H2n
lim
n→∞ n
is the entropy rate of plaintext when written in the alphabet L2 .
We compute this limit as follows. The sequence {H2n /2n} is a subsequence of
the convergent sequence {Hn /n}, and thus, by Rudin [51, p. 51], { H2n2n } converges to
the same limit as { Hnn }. So we take

H2n
lim = 1.5,
n→∞ 2n
thus,

H2n H2n
lim = 2 lim = 2(1.5) = 3.0.
n→∞ n n→∞ 2n

Hence, the entropy rate is 3.0 bits/char.


Now, the maximum entropy is log2 (262 ) = 9.4, and so, the redundancy rate in
the Hill 2 × 2 cipher is

9.4 − 3.0 = 6.4 bits/char.


8.4 Cryptanalysis of the Simple Substitution Cryptosystem 149

The keyspace K consists of all invertible 2 × 2 matrices over Z26 . From


Section 7.4, we find that |K| = |GL2 (Z26 )| = 157, 248. Thus, the unicity distance
of the Hill 2 × 2 cipher is

log2 (157, 248)


= 2.69 characters.
6.4
Of course, a character is actually a 2-gram block, so we may take the unicity distance
of the Hill 2 × 2 cipher to be 5.38.
The unicity distance of the Hill 2 × 2 cipher (u.d. = 5.38) is larger than that of
the affine cipher (u.d. = 2.59) but smaller than the unicity distance of the simple
substitution cryptosystem (u.d. = 27.61).
The Hill 2 × 2 cipher can be generalized to the Hill n × n cipher, n > 2, which is
a block cipher that encrypts blocks of n letters at a time using matrices in GLn (Z26 ).
See [47, §8.2].
Using Proposition 7.4.3, one can show that the unicity distance of the Hill n × n
cipher is

2 n
log2 26n i=1 1− 1
2i
1− 1
13i
3.2n
characters, where each character is a block of length n.

8.4 Cryptanalysis of the Simple Substitution Cryptosystem

Suppose Alice and Bob are using the simple substitution cryptosystem
M, C, e, d, K to communicate securely. They have met previously to establish
a shared secret key k = ke = kd . The shared key k is an integer 1 ≤ k ≤ 26!
and indicates which of the 26! permutations is to be used to encrypt and decrypt
messages. Alice will use permutation σk to encrypt, and Bob will use its inverse
σk−1 to decrypt.
Malice has eavesdropped on their conversation and has obtained 206 characters
of ciphertext:
C=
EVWB WB X FZZD XFZNE VZG LZYLQREB WY SXEVQSXEWLB XUQ NBQK EZ
KQCQTZR EVQ FXBWL LZSRZYQYEB ZH LUJREZAUXRVJ XE EWSQB EVQ
SXEVQSXEWLXT HZNYKXEWZYB XUQ XB WSRZUEXYE XB EVQ LUJREZAUXRVJ
EVQ EQME XBBNSQB FXBWL DYZGTQKAQ ZH LXTLNTNB XYK TWYQXU
XTAQFUX

We can view C as a single word of length 244 over the 27 character alphabet

{A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z,  b},
150 8 Symmetric Key Cryptography

where the additional character  b is the symbol for a word separator (white space).
There are 206 ordinary letters plus 38 word separators. The permutation σk used by
Alice satisfies σk ( b) = b. Thus there are still only 26! possible keys.
Whether or not we view the white space as a character, Malice’s goal is to
determine Alice’s original plaintext message using a method of cryptanalysis.
Malice knows that the brute-force method of key trial will likely determine the
correct key, and thus break the cipher, since the unicity distance of the simple
substitution cryptosystem is 28 characters.
Malice wants to avoid the time-consuming method of key trial; key trial is
almost impossible since there are 26! ≈ 4 × 1026 keys, nearly beyond a feasible
computation.
Instead, Malice chooses to make an educated guess as to what the key could
be. To do this, he will employ a technique of cryptanalysis called frequency
analysis. Frequency analysis employs the (known) relative frequency distributions
of plaintext English n-grams.
We have the 1-gram relative frequency distribution f1 : L1 → [0, 1] of plaintext
English (Figure 3.3), which in table form appears as

Letter Prob. Letter Prob.


A 0.0804 N 0.0709
B 0.0154 O 0.0760
C 0.0306 P 0.0200
D 0.0399 Q 0.0011
E 0.1251 R 0.0612
F 0.0230 S 0.0654
G 0.0196 T 0.0925
H 0.0549 U 0.0271
I 0.0726 V 0.0099
J 0.0016 W 0.0192
K 0.0067 X 0.0019
L 0.0414 Y 0.0173
M 0.0253 Z 0.0009

The eight highest relative frequencies occur for the letters

E, T, A, O, I, N, S, H.

By Bernoulli’s theorem (or the Law of Large Numbers), in a sample of plaintext


English consisting of 206 letters, we can expect the following frequencies for these
letters:
8.4 Cryptanalysis of the Simple Substitution Cryptosystem 151

Letter Expected freq. Letter Expected freq.


E 0.1251 × 206 ≈ 26 I 0.0726 × 206 ≈ 15
T 0.0925 × 206 ≈ 19 N 0.0709 × 206 ≈ 15
A 0.0804 × 206 ≈ 17 S 0.0654 × 206 ≈ 13
O 0.0760 × 206 ≈ 16 H 0.0549 × 235 ≈ 11

In the collection of 206 ciphertext characters that Malice has obtained, the eight
highest letter frequencies are

Letter Frequency Letter Frequency


X 24 B 16
E 22 L 11
Q 20 W 11
Z 17 V 10

From these data, together with knowledge of common English words of lengths
one, two, and three, Malice can determine the plaintext as follows.
Malice first compares the expected frequencies in the plaintext with the actual
frequencies in the ciphertext and guesses that Alice has encrypted the plaintext using
a permutation σk in which

E → X.

Notice that the ciphertext contains the 2-gram XE, which is the encryption of a 2-
letter word in English. Thus, if Alice had used a permutation with E → X, then XE
is the encryption of a 2-letter word in English that begins with E. There are no such
words in common usage.
So a better guess by Malice is that

E → E.

(based on a comparison of frequencies.) But then the ciphertext 2-gram EZ is the


encryption of a 2-letter word in English that begins with E, which is again not
possible.
So Malice assumes that E → Q. But now, the 3-gram EVQ in the ciphertext
is the encryption of a 3-letter word in English ending in E; it is likely that the
plaintext word is THE, which is the most common 3-letter word in English. So
Malice concludes further that

T → E, H → V.
152 8 Symmetric Key Cryptography

Now, the ciphertext 2-gram EZ is the encryption of a 2-letter word in English


beginning with T, which Malice guesses to be TO. Thus

O → Z.

Finally, Malice guesses that the ciphertext 1-gram X is most likely the encryption of
the English 1-gram A. Thus Malice guesses that Alice’s permutation satisfies

A → X.

After this analysis, Malice concludes that Alice’s encryption permutation is of


the form

A B C D E F G H I J K L M
X ∗ ∗ ∗ Q ∗ ∗ V ∗ ∗ ∗ ∗ ∗

N O P Q R S T U V W X Y Z
.
∗ Z ∗ ∗ ∗ ∗ E ∗ ∗ ∗ ∗ ∗ ∗

Decryption of the ciphertext therefore uses an inverse σk−1 of the form



A B C D E F G H I J K L M
∗ ∗ ∗ ∗ T ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

N O P Q R S T U V W X Y Z
.
∗ ∗ ∗ E ∗ ∗ ∗ ∗ H ∗ A ∗ O
Malice then computes M = d(C, k) to obtain M =
TH∗∗ ∗∗ A ∗OO∗ A∗O∗T HO∗ ∗O∗∗E∗T∗ ∗∗ ∗ATHE∗AT∗∗∗ A∗E ∗∗E∗ TO
∗E∗E∗O∗ THE ∗A∗∗∗ ∗O∗∗O∗E∗T∗ O∗ ∗∗∗∗TO∗∗A∗H∗ AT T∗∗E∗ THE
∗ATHE∗AT∗∗A∗ ∗O∗∗∗AT∗O∗∗ A∗E A∗ ∗∗∗O∗TA∗T A∗ THE ∗∗∗∗TO∗∗A∗H∗
THE TE∗T A∗∗∗∗E∗ ∗A∗∗∗ ∗∗O∗∗E∗∗E O∗ ∗A∗ ∗ ∗ ∗ ∗∗ A∗∗ ∗∗∗EA∗
A∗∗E∗∗A

Now, in the partial decryption, we guess that the word A∗E is ARE. Thus

ARE → XUQ

and so, Malice concludes that Alice’s permutation satisfies

R → U.

Moreover, we guess that ∗ATHE∗AT∗∗∗ is MATHEMATICS, and


∗ATHE∗AT∗∗A∗ is MATHEMATICAL. Thus

MATHEMATICS → SXEVQSXEWLB, MATHEMATICAL → SXEVQSXEWLXT,

thus σk satisfies

M → S, I → W, C → L, S → B, L → T.
8.4 Cryptanalysis of the Simple Substitution Cryptosystem 153

Malice now refines his guess for σk and deduces that the inverse σk−1 is of the
form

A B C D E F G H I J K L M
∗ S ∗ ∗ T ∗ ∗ ∗ ∗ ∗ ∗ C ∗

N O P Q R S T U V W X Y Z
.
∗ ∗ ∗ E ∗ M L R H I A ∗ O

With this refinement of σk−1 , Malice computes M = d(C, k) to obtain M =


THIS IS A ∗OO∗ A∗O∗T HO∗ CO∗CE∗TS I∗ MATHEMATICS ARE ∗SE∗ TO
∗E∗ELO∗ THE ∗ASIC COM∗O∗E∗TS O∗ CR∗∗TO∗RA∗H∗ AT TIMES THE
MATHEMATICAL ∗O∗∗∗ATIO∗S ARE AS IM∗ORTA∗T AS THE CR∗∗TO∗RA∗H∗
THE TE∗T ASS∗MES ∗ASIC ∗∗O∗LE∗∗E O∗ CALC∗L∗S A∗∗ LI∗EAR
AL∗E∗RA
In this manner, Malice continues to refine σk−1 , and ultimately, obtains the
plaintext,
THIS IS A BOOK ABOUT HOW CONCEPTS IN MATHEMATICS ARE USED TO
DEVELOP THE BASIC COMPONENTS OF CRYPTOGRAPHY AT TIMES THE
MATHEMATICAL FOUNDATIONS ARE AS IMPORTANT AS THE CRYPTOGRAPHY
THE TEXT ASSUMES BASIC KNOWLEDGE OF CALCULUS AND LINEAR ALGEBRA

from the Preface


The technique of frequency analysis works well in this example because the
plaintext English message M (the Preface selection) exhibits a 1-gram frequency
distribution, which is not that different from the accepted 1-gram distribution
f1 : L1 → [0, 1] of plaintext English. See Figure 8.1.
More precisely, if g : L1 → [0, 1] is the 1-gram relative frequency distribution
of M, then the total variation distance

1
Dg = |g(α) − f1 (α)|
2
α∈

will be small.
On the other hand, as a method of cryptanalysis, frequency analysis does not
work as well if the plaintext M is not typical English in terms of letter frequencies.
For example, the 82 letters of plaintext
M=
HYDROGEN HELIUM LITHIUM BERYLLIUM BORON CARBON NITROGEN
OXYGEN
FLUORINE NEON SODIUM MAGNESIUM
exhibit 1-gram frequencies that vary significantly from the accepted frequencies (see
Figure 8.2).
154 8 Symmetric Key Cryptography

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Fig. 8.1 Accepted 1-gram frequencies (blue), compared to 1-gram frequencies of the Preface
selection (red)

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Fig. 8.2 Accepted 1-gram frequencies (blue), compared to 1-gram frequencies atomic elements
(red)

Thus it would be relatively difficult to decrypt C = e(M, k) if M was encrypted


using a simple substitution cryptosystem.
Block ciphers (such as the Hill 2 × 2 cipher) fare better under an attack
by frequency analysis since 2-gram frequencies must be used (Figure 3.4, [35,
Table 2.3.4]) rather than 1-gram frequencies. Hence more characters of ciphertext
8.5 Polyalphabetic Cryptosystems 155

must be captured to make good guesses for the encryption of plaintext 2-grams.
Indeed, the unicity distance of the Hill 2 × 2 cipher is larger than that of the affine
cipher.
In the symmetric cryptosystems that we have discussed above: the simple
substitution cryptosystem, the affine cipher, and the shift cipher, each letter of the
plaintext message is encrypted as a unique letter in the ciphertext. This is also true
for the Hill cipher since the characters that make up the plaintext (2-gram blocks of
letters AA, AB, and so on) are encrypted as unique 2-gram blocks of letters in the
ciphertext.
These ciphers are “monoalphabetic” cryptosystems.
Definition 8.4.1 A monoalphabetic cryptosystem is a cryptosystem in which
each letter of the plaintext message is encrypted as a unique letter in the ciphertext.
Monoalphabetic cryptosystems have security concerns for the following rea-
sons.
(1) A monoalphabetic cryptosystem is vulnerable to a known-plaintext attack. For
instance, let M, C, e, d, K be the simple substitution cryptosystem with =
{0, 1, 2, 3, . . . , 25}. If

5 13 2 20 9 8 5 = e(0 11 6 4 1 17 0, k),

then we can immediately infer that

1 4 0 17 = d(9 20 5 8, k).

(2) A monoalphabetic cryptosystem is vulnerable to a ciphertext-only attack using


frequency analysis. For instance, in plaintext English the letter A occurs with
probability ≈ 0.08. If the letter Z occurs with relative frequency ≈ 0.08 in a
sample of ciphertext, then we could make a good guess that e(A, k) = Z since
Z is the encryption of exactly one plaintext letter.
We shall address these security issues in the next section with the introduction of
“polyalphabetic” cryptosystems.

8.5 Polyalphabetic Cryptosystems

Definition 8.5.1 A polyalphabetic cryptosystem is a cryptosystem in which a


letter in the plaintext message is encrypted as more than one letter in the ciphertext.
We review some important polyalphabetic cryptosystems.
156 8 Symmetric Key Cryptography

8.5.1 The Vigenère Cipher

Definition 8.5.2 (Vigenère Cipher) Let = {0, 1, 2, 3, . . . , n − 1} denote the set


of n letters. A message M ∈ M is a finite sequence of r letters:

M = M0 M1 M2 . . . Mr−1 , Mi ∈ .

The encryption key ke is a finite sequence of s letters:

k = k0 k1 k2 . . . ks−1 , ki ∈ .

The message M = M0 M1 M2 · · · Mr−1 is encrypted letter by letter to yield the


ciphertext C = C0 C1 C2 . . . Cr−1 where

Ci = e(Mi , k) = ((Mi + k(i mod s) ) mod n).

The ciphertext C = C0 C1 C2 · · · Cr−1 is decrypted letter by letter to yield the


message M = M0 M1 M2 · · · Mr−1 with

Mi = d(Ci , k) = ((Ci − k(i mod s) ) mod n).



We leave it to the reader to show that the Vigenère cipher works, i.e., (8.1) holds.
Here is an example.
Example 8.5.3 Let = {0, 1, 2, 3 . . . , 25} be the set of 26 letters, and let
M, C, e, d, k be the Vigenère cipher with encryption–decryption key

k = k0 k1 k2 k3 = 3 7 20 12.

The encryption of M = 18 19 0 19 4 14 5 0 11 0 1 0 12 0 is

e(18 19 0 19 4 14 5 0 11 0 1 0 12 0, 3 7 20 12)
= 21 0 20 5 7 21 25 12 14 7 21 12 15 7

since

C0 = ((M0 + k0 ) mod 26) = ((18 + 3) mod 26) = 21,


C1 = ((M1 + k1 ) mod 26) = ((19 + 7) mod 26) = 0,
C2 = ((M2 + k2 ) mod 26) = ((0 + 20) mod 26) = 20,
C3 = ((M3 + k3 ) mod 26) = ((19 + 12) mod 26) = 5,
C4 = ((M4 + k0 ) mod 26) = ((4 + 3) mod 26) = 7,
8.5 Polyalphabetic Cryptosystems 157

C5 = ((M5 + k1 ) mod 26) = ((14 + 7) mod 26) = 21,


..
.
C13 = ((M13 + k1 ) mod 26) = ((0 + 7) mod 26) = 7.

The encryption can be done efficiently using “vertical addition" modulo 26:

18 19 0 19 4 14 5 0 11 0 1 0 12 0
3 7 20 12 3 7 20 12 3 7 20 12 3 7
21 0 20 5 7 21 25 12 14 7 21 12 15 7



The Vigenère cipher is polyalphabetic since the letter 0 in the plaintext message
encrypts as the letters 20, 12, and 7. Translating back to the ordinary alphabet, we
see that the letter A encrypts as the letters U,M,H.

8.5.2 Unicity Distance of the Vigenère Cipher

We take n = 26, so that we are essentially using the ordinary alphabet of 26 letters.
We first note that the size of the keyspace in the Vigenère cipher is

|K| = 26s ,

where s is the key length. Thus (3.1) yields

log2 (26s ) s log2 (26) 4.7s


u.d. = = = .
3.2 3.2 3.2
In the case of Example 8.5.3, we have s = 4, thus the u.d. is 5.87.

8.5.3 Cryptanalysis of the Vigenère Cipher

The Vigenère cipher is less vulnerable than a monoalphabetic cryptosystem to a


ciphertext-only attack using frequency analysis. This is the case since a ciphertext
letter can be the encryption of more than one plaintext letter, thus the relative
frequency of a ciphertext letter reveals less about possible plaintext decryptions.
Still, the Vigenère cipher can be broken using a modification of frequency
analysis.
158 8 Symmetric Key Cryptography

Suppose that message M is encrypted using the Vigenère cipher with key k of
length s. The method of cryptanalysis used (to determine M knowing C = e(M, k))
depends on whether the key length is known to the attacker.

Key Length Is Known

Suppose Alice and Bob are using the Vigenère cipher M, C, e, d, K with shared
key k of length s. Malice knows the value of s and has obtained m & s characters
of ciphertext:

C0 C1 C2 . . . Cm−1 ,

Ci = e(Mi , k).
To simplify matters we assume that s | m; let q = m/s. Consider the subsets of
ciphertext characters:

C0 Cs C2s . . . C(q−1)s

C1 C1+s C1+2s . . . C1+(q−1)s

C2 C2+s C2+2s . . . C2+(q−1)s

..
.

Cs−1 C2s−1 C3s−1 . . . Cqs−1

Each subset can be viewed as the encryption of a subset of plaintext letters using the
shift cipher with key ki for some 0 ≤ i ≤ s − 1. For example,

C0 Cs C2s . . . C(q−1)s = e(M0 Ms M2s . . . M(q−1)s , k0 )

since

Cj s = ((Mj s + k(j s mod s) ) mod n) = ((Mj s + k0 ) mod n)

for 0 ≤ j ≤ q − 1.
Using the method of frequency analysis on each subset of ciphertext letters, we
can then determine the most likely values for k0 , k1 , . . . , ks−1 .
Example 8.5.4 Suppose Alice and Bob are using the Vigenère cipher with a key
length of 2. Malice knows the key length and has obtained 290 characters of
ciphertext C =
KUFEXBGDWRHBLUYUWXDJLMDIVJDDGYQWWXHHHYQJKUZYWSKIZYQTRMDDGYZQY
UGLHHBIOEZBBQWXLCDDGXHMDLHTYUUOVBRMOODJPURKUMDLLDJIHUPUGJRRHL
8.5 Polyalphabetic Cryptosystems 159

HHBTLIWQQJWHDLHBOYQWIHRCRKUQUCVBLAHJZESURFOUZQYYQWDJHQFXRJKUU
YQTLVIUUUQJFYWYHISUUXDFVRHJZUHDWQFEPQDDGIDBHCDDGEXHZQYYQWZQVC
HHHBBQQUFXREIJKULHZQYYQWDSUEVIWXRKVQQTVEICLBHI

In this case, s = 2, q = 145, and there are 2 subsets of ciphertext:


C0 C2 C4 . . . C288 =
KFXGWHLYWDLDVDGQWHHQKZWKZQRDGZYGHBOZBWLDGHDHYUVRODPRUD
LJHPGRHHBLWQWDHOQIRRUUVLHZSROZYQDHFRKUQLIUQFWHSUDVHZHW
FPDGDHDGXZYQZVHHBQFRIKLZYQDUVWRVQVILH
and C1 C3 C5 . . . C289 =
UEBDRBUUXJMIJDYWXHYJUYSIYTMDYQULHIEBQXCDXMLTUOBMOJUKML
DIUUJRLHTIQJHLBYWHCKQCBAJEUFUQYWJQXJUYTVUUJYYIUXFRJUDQ
EQDIBCDEHQYWQCHBQUXEJUHQYWSEIXKQTECBI
In the subset C0 C2 . . . C288 , the letter H occurs most often (19 times), and so we
deduce that the encryption transformation takes E to H, thus k0 = 3. (One could also
obtain k0 = 3 using key trial on the subset C0 C2 . . . C288 .)
Likewise, in the subset C1 C3 . . . C289 , the letter U occurs most often, and so we
deduce that the encryption transformation takes E to U, thus k1 = 16.
The correct key is indeed k = (3, 16) and we obtain M = d(C, (3, 16)) =
He couldn’t believe that I was standing there in the
witch’s window and I waved very slowly at him and he waved
very slowly at me. Our waving seemed to be very distant
travelling from our arms like two people waving at each other
in different cities, perhaps between Tacoma and Salem, and
our waving was merely an echo of their waving across
thousands of miles.
Richard Brautigan, from 1692 Cotton Mather Newsreel



Key Length Is Not Known

Now suppose that message M is encrypted using the Vigenère cipher with key k.
Suppose that Malice or an attacker does not know the length of the key. The first
task for the attacker is to find the length of the key. To this end, we use the Kasiski
method.
We take advantage of the fact that certain 2-grams appear relatively frequently in
plaintext English. For instance the 2-gram TH appears with probability ≈0.03 (see
Figure 3.4).
If the ratio of key length to the length of the plaintext message is small enough,
then it is likely that some occurrences of common 2-grams (e.g., TH) in the plaintext
will coincide with the same letters in the key. When this happens the gcd of the
distances between the occurrences is the key length.
160 8 Symmetric Key Cryptography

With this in mind, we look for 2-grams in the ciphertext that appear relatively
often and compute the gcd of the distances between their occurrences. This value is
a good guess for the key length.
Example 8.5.5 In the ciphertext of Example 8.5.4 the 2-gram WX appears 4 times;
the distances between occurrences are 16, 42, 58. We have

gcd(16, 42) = gcd(42, 58) = 2, gcd(16, 58) = 8,

and so it is likely that the key length is 2, which is correct.


Once the key length has been established the attacker proceeds exactly as in the
“Key Length Is Known” case above. 

Our next polyalphabetic symmetric cryptosystem is a special case of the Vigenère
cipher.

8.5.4 The Vernam Cipher

Let = {0, 1} denote the set of 2 letters (bits), and let ∗ = {0, 1}∗ denote the set
of all finite sequences of 0’s and 1’s. Let a = a1 a2 · · · am , b = b1 b2 · · · bm ∈ {0, 1}∗ .
Then bit-wise addition modulo 2 is defined as

a ⊕ b = c1 c2 · · · cm ,

where

ci = ((ai + bi ) mod 2).

Definition 8.5.6 (Vernam Cipher) Let = {0, 1} and M = C = {0, 1}∗ . A


message is a finite sequence of bits

M = M0 M1 M2 · · · Mr−1 , Mi ∈ {0, 1}.

The key for encryption and decryption is a sequence of r letters

k = k0 k1 k2 · · · kr−1 ,

chosen uniformly at random from the set {0, 1} and shared as a secret key by Alice
and Bob. It is important to note that the key must be the same length r as the
message. The encryption of M is given as

C = e(M, k) = M ⊕ k.
8.5 Polyalphabetic Cryptosystems 161

Decryption proceeds as follows:

M = d(C, k) = C ⊕ k.

The Vernam cipher works since

d(e(M, k), k) = d(M ⊕ k, k)


= (M ⊕ k) ⊕ k
= M ⊕ (k ⊕ k)
= M.

Here is an example of a Vernam cipher.


Example 8.5.7 We begin with the familiar correspondence:

A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.

We then write the numbers in 5-bit binary, giving the encoding

A ↔ 00000, B ↔ 00001, C ↔ 00010, D ↔ 00011, . . . , Z ↔ 11001.

Now, for instance,

C A T ↔ 00010 00000 10011

Alice and Bob have established the shared secret key

k = 11100 10100 10010,

where each bit is chosen uniformly at random from {0, 1}, and so the encryption of
C A T is

C = e(00010 00000 10011, 11100 10100 10010)


= 00010 00000 10011 ⊕ 11100 10100 10010
= 11110 10100 00001.

Note: C may not correspond to a sequence of ordinary alphabet letters. 



The Vernam cipher is polyalphabetic; we have

1 1
Pr[e(0, k) = 0] = , Pr[e(0, k) = 1] = .
2 2
162 8 Symmetric Key Cryptography

Perfect Secrecy

A cryptosystem M, C, e, d, K provides “perfect secrecy” if knowledge of C gives


no new, additional information about the message M. As we shall see, the Vernam
cipher has perfect secrecy. Here is a formal definition.
Let (, A, Pr) be an abstract probability space, let XM :  → M, XC :  → C
denote random variables, where

(XM = M) = {ω ∈  : XM (ω) = M}

denotes the event: Plaintext message M ∈ M is sent, with probability Pr(XM =


M), i.e., Pr(XM = M) is the probability that the plaintext message is M. Likewise,

(XC = C) = {ω ∈  : XC (ω) = C}

denotes the event: Ciphertext C ∈ C is received, with probability Pr(XC = C), i.e.,
Pr(XC = C) is the probability that the ciphertext is C.
Definition 8.5.8 A cryptosystem has perfect secrecy if intercepted ciphertext
reveals no new information about the corresponding plaintext. More precisely, a
cryptosystem has perfect secrecy if

Pr(XM = M|XC = C) = Pr(XM = M)

for all M ∈ M, C ∈ C.
Proposition 8.5.9 The Vernam cipher has perfect secrecy.
Proof Let k = k0 k1 . . . kr−1 be the random key and let M = M0 M1 . . . Mr−1 be
a message. Intuitively, the ciphertext C = e(M, k) = M ⊕ k, which is given by
bit-wise addition modulo 2, is just as random as k. Thus knowledge of C gives no
new information about the message M, i.e., for any M, C, the events (XM = M)
and (XC = C) are independent. The result follows. 


8.5.5 Unicity Distance of the Vernam Cipher

In the Vernam cipher, |K| = ∞. To see this, suppose that |K| = n < ∞.
Necessarily, n = 2m for some m ≥ 1. Then the length of any message must be
≤ m since the length of the key must equal the length of the message. But one can
always write a message of length > m. Thus |K| = ∞.
Now assume that English messages are encoded in ASCII. As shown in
Section 3.2.1, the redundancy rate per byte is 6.5. Thus the redundancy rate per
8 = 0.8125.
bit is 6.5
8.6 Stream Ciphers 163

Thus the unicity distance of the Vernam cipher is


log2 (|K|)
u.d. =
0.8125

=
0.8125
= ∞ characters.

This result is consistent with perfect secrecy: A brute-force attack by key trial cannot
be used to uniquely determine the key.

8.6 Stream Ciphers

Definition 8.6.1 (Stream Cipher) Let = {0, 1} and M = C = {0, 1}∗ . A


message is a finite sequence of bits

M = M0 M1 M2 · · · Mm−1 , Mi ∈ {0, 1}

of length m. The secret key shared by Alice and Bob is a sequence of l bits

k = k0 k1 k2 · · · kl−1 ,

chosen uniformly at random from the set {0, 1}. Alice and Bob use this random
“seed" k to generate a longer sequence of m ≥ l bits,

b = b0 b1 b2 . . . bm−1

using a deterministic process; b is the key stream. The encryption of M is given as

C = e(M, b) = M ⊕ b,

and decryption is

M = d(C, b) = C ⊕ b.

(See Figure 8.3).

Fig. 8.3 A stream cipher


key stream
011010…

plaintext ciphertext
101001… 110011…
164 8 Symmetric Key Cryptography

To generate the keystream b from the random seed k, Alice and Bob use a
function

G : {0, 1}l → {0, 1}m , k → b,

called a bit generator. Hence b is not random. Security of a stream cipher depends
on how well the output b = G(k) of the bit generator simulates a truly random
stream of bits. We will discuss bit generators in detail in Chapter 11.
Example 8.6.2 Define a bit generator G : {0, 1}3 → {0, 1}12 by the rule G(k) =
b = k 4 = kkkk. If the shared random seed is k = 101, then G(101) =
101101101101 and the encryption of the message M = 100101110001 is

C = 100101110001 ⊕ 101101101101 = 001000011100.



Example 8.6.2 shows that the Vigenère cipher with alphabet = {0, 1} is a
special case of the stream cipher.

8.7 Block Ciphers

As we saw in Section 8.3.1, the Hill cipher is a block cipher since it encrypts blocks
of 2-grams over the alphabet {A, B, . . . , Z}. In this section we discuss block ciphers
over the alphabet of bits.
Let m ≥ 1, n ≥ 1 be integers. Let {0, 1}m denote the set of all sequences of 0s
and 1s of length m, and let {0, 1}n denote the set of all sequences of 0s and 1s of
length n.
Definition 8.7.1 A block cipher is a symmetric key cryptosystem whose encryp-
tion transformation e is a function

e : {0, 1}m × {0, 1}n → {0, 1}m .

Here M = C = {0, 1}m . Thus a message M ∈ M is a sequence of bits of length m.


Note that the keyspace K = {0, 1}n ; a typical key k is a sequence of n bits.
Decryption is a function d

d : {0, 1}m × {0, 1}n → {0, 1}m .

A block cipher satisfies the fundamental relation:

d(e(M, k), k) = M,

for M ∈ {0, 1}m , k ∈ {0, 1}n .


8.7 Block Ciphers 165

8.7.1 Iterated Block Ciphers

An iterated block cipher is a block cipher in which encryption is achieved by


performing a finite sequence of rounds. Each round involves the application of a
function called the round function.

Feistel Ciphers

A Feistel cipher with r rounds is an iterated block cipher in which M = C =


{0, 1}2t , where t ≥ 1. Thus both the message space and the ciphertext space consist
of sequences of even length 2t. The keyspace is of the form K = {0, 1}t . The key
k ∈ {0, 1}t is used in some prescribed way to generate a set of r round keys ki ,
each of length t: k1 , k2 , k3 , . . . , kr . The round function has the form

f : {0, 1}t × {0, 1}t → {0, 1}t .

Encryption in a Feistel cipher proceeds as follows.


Step 1. Message M ∈ {0, 1}2t is broken into two blocks of length t:

M = (L0 , R0 ),

where L0 is the left block and R0 is the right block.


Step 2. Beginning with the initial blocks (L0 , R0 ), each of the r rounds produces
a new pair of blocks, in some prescribed manner:

round 1 round 2 round r


(R0 , L0 ) −→ (L1 , R1 ) −→ (L2 , R2 ) · · · −→ (Lr , Rr ).

Step 3. The final pair of blocks C = (Lr , Rr ) is the encryption of message M =


(L0 , R0 ).
In round i, 1 ≤ i ≤ r, new blocks are produced from previous blocks according
to formula:

Li = Ri−1
Ri = Li−1 ⊕ f (Ri−1 , ki ).

The ith round of the Feistel cipher is illustrated in Figure 8.4.


Decryption of the ciphertext C = (Lr , Rr ) proceeds by reversing the r rounds to
obtain M = (L0 , R0 ):

round 1 round r − 1 round r


(Rr , Lr ) −→ (Lr−1 , Rr−1 ) · · · −→ (L1 , R1 ) −→ (L0 , R0 ).
166 8 Symmetric Key Cryptography

Fig. 8.4 ith round of the L i-1 Ri-1


Feistel cipher

f ki

Li Ri

The ith round of decryption yields



Lr−i = Rr−i+1 ⊕ f (Rr−i , kr−i+1 )
Rr−i = Lr−i+1 .

Example 8.7.2 In this 16-bit 2 round Feistel cipher, the encryption transformation
is

e : {0, 1}16 × {0, 1}8 → {0, 1}16 .

Assume that the key k ∈ {0, 1}8 generates round keys

k1 = 01000111, k2 = 11100101,

and the round function

f : {0, 1}8 × {0, 1}8 → {0, 1}8

is defined as (a, b) → a ⊕ b. Let M = AA, which in ASCII appears as

0100000101000001.

We compute C = e(0100000101000001, k). Note that

L0 = 01000001, R0 = 01000001.

In round 1,

L1 = R0 = 01000001,
8.7 Block Ciphers 167

and

R1 = L0 ⊕ f (R0 , k1 )
= 01000001 ⊕ f (01000001, 01000111)
= 01000001 ⊕ (01000001 ⊕ 01000111)
= 01000001 ⊕ 00000110
= 01000111.

In round 2,

L2 = R1 = 01000111.

and

R2 = L1 ⊕ f (R1 , k2 )
= 01000001 ⊕ f (01000111, 11100101)
= 01000001 ⊕ (01000111 ⊕ 11100101)
= 01000001 ⊕ 10100010
= 11100011.

Thus

C = e(M, k) = (L2 , R2 ) = 0100011111100011.




The Data Encryption Standard (DES)

First devised in 1977, the Data Encryption Standard (DES) was the most widely
used and well-known symmetric key cryptosystem of the modern era. The DES is
a 16-round iterated block cipher with M = C = {0, 1}64 . DES is essentially a
Feistel cipher. However in DES the key k is an element of {0, 1}64 , which is used to
generate 16 round keys, each of which is an element of {0, 1}48 . Moreover, in DES
the round function is of the form

f : {0, 1}32 × {0, 1}48 → {0, 1}32 .


168 8 Symmetric Key Cryptography

The Advanced Encryption Standard (AES)

Due to security concerns with DES, the Advanced Encryption Standard (AES)
was proposed by V. Rijmen and J. Daemen in the 1990s. In 2002, AES was accepted
by the US Government as the new standard for symmetric key encryption. The AES
is a 10-round iterated block cipher with M = C = {0, 1}128 . The key k is an element
of {0, 1}128 and is used to generate 10 round keys (as in DES).

8.8 Exercises

1. Let {σ1 , σ2 , σ3 , . . . , σ26! } be the set of permutations of the alphabet

A = {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z},

and consider the simple substitution cryptosystem with


σk =
A B C D E F G H I J K L M
X M T A K Z Q B N O L E S

N O P Q R S T U V W X Y Z
I F G R V J H C Y U D P W
and key k.
(a) Compute C = e(AUBUR NUNIV ERSIT YATMO NTGOM ERY, k).
(b) Compute M = d(NIZFV SXHNF IHBKF VPXIA KIHVF GP, k).
2. Let M, C, e, d, K denote the simple substitution cryptosystem with =
{0, 1, 2, 3, 4}, M = C = ∗ = {0, 1, 2, 3, 4}∗ , and
 
01234
σk = .
43021

(a) Compute C = e(240014, k).


(b) Compute M = d(4130, k).
3. Let M, C, e, d, K denote the affine cipher with

= {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},



M=C= = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}∗ ,

and key (a, b).


(a) Compute C = e(1 7 10, (7, 3)).
8.8 Exercises 169

(b) Compute M = d(12 9, (3, 10)).


4. Let M, C, e, d, K denote the shift cipher with

= {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

and M = C = ∗ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}∗ .


(a) Compute C = e(7280, 8).
(b) Compute M = d(22109, 5).
5. Suppose = {0, 1, 2, 3, . . . , 34} is the alphabet for the affine cipher. Find the
size of the keyspace.  
21
6. Assume that Alice and Bob are using the Hill cipher with key A = .
31
(a) Verify that A is a valid key.
(b) Compute M = d(WD, A).
(c) Show that there is at least one spurious key for C = WD.
7. Consider the following generalization of the Hill 2×2 cipher in which = L3 ,
the collection of all 3-grams over the ordinary alphabet {A, B, . . . , Z}. Messages
are finite sequences of 3-grams,

M = M0 M1 · · · Mr−1 ,

where Mi = mi,1 mi,2 mi,3 , 0 ≤ i ≤ r − 1, mi,j ∈ Z26 . The encryption and
decryption keyspace is GL3 (Z26 ); encryption is given as

Ci = e(Mi , A) = (AMiT )T .

⎛ ⎞
100
(a) Compute C = e(CAT, A), where A = ⎝1 1 0⎠.
111
(b) Compute the size of the keyspace.
(c) Compute the redundancy rate of plaintext English.
(d) Compute the unicity distance of the cryptosystem.
8. Suppose Malice intercepts the ciphertext
C=
LW VKRXBG EH REVHUYHG WKDW WKH HTXLYDBHQFHV VKRZQ LQ
FKDSWHU WZR EHWZHHQ WKH YDULRXV CRGHBV RI ILQLWH DXWRCDWD
DQG UHJXBDU HNSUHVVLRQV ZHUH HIIHFWLYH HTXLYDBHQFHV LQ
WKH VHQVH WKDW DBJRULWKCV ZHUH JLYHQ WR WUDQVBDWH IURC
RQH UHSUHVHQWDWLRQ WR DQRWKHU
170 8 Symmetric Key Cryptography

which is known to have been encrypted using a simple substitution cryptosys-


tem with permutation σk . Using frequency analysis, determine which of the
following values of σk (T) is most likely to have been used for encryption.
(a) σk (T) = N.
(b) σk (T) = W.
(c) σk (T) = X.
9. Referring to Exercise 8, use frequency analysis to decrypt C.
10. Suppose that Alice and Bob are using the simple substitution cryptosystem
on the standard 26 letter alphabet and Malice has obtained 10 characters of
ciphertext in a transmission between Alice and Bob. Malice uses the brute-
force method of key trial to obtain the key. (Assume that Malice has unlimited
computing power.) Prove that there is at least one spurious key for the
ciphertext.
11. Assume that the alphabet {A, B, . . . , Z} is encoded in ASCII as bytes 01000001,
01000010, . . . , 01011010.
12. Compute the redundancy per letter (per byte) of plaintext English with respect
to this encoding. How does the redundancy rate compare to the standard value
of 3.2 bits/letter?
13. Let M be the Preface paragraph from Section 8.4. Let g : L1 → [0, 1] be the
1-gram relative frequency distribution of M and let f1 : L1 → [0, 1] be the
accepted 1-gram relative frequency distribution of English (Figure 3.3).
Compute the total variation distance

1
Dg = |g(α) − f1 (α)|.
2
α∈

Compare D to the total variation distance

1
Dh = |h(α) − f1 (α)|
2
α∈

where h : L1 → [0, 1] is the relative frequency distribution of the atomic


elements (see Figure 8.2).
14. Let M, C, e, d, K denote the Vigenère cipher with = {0, 1, 2, . . . , 25} and
M = C = ∗ = {0, 1, 2, . . . , 25}∗ .
(a) Compute C = e(4 17 19 14 22 4 17, 20 18 6).
(b) Compute M = d(13 20 3 3 2 3, 2 12).
15. Let M, C, e, d, K denote the Vigenère cipher with = {0, 1, 2, 3, 4}
and M = C = ∗ = {0, 1, 2, 3, 4}∗ . It is known that 3124/1200 is a
plaintext/ciphertext pairing.
(a) Under the assumption that the key is k = k0 k1 , compute M = d(112, k0 k1 ).
(b) Prove that the key cannot have length 3.
8.8 Exercises 171

16. Let M, C, e, d, K denote the Vernam cipher with = {0, 1} and M = C =
∗ = {0, 1}∗ .

(a) Compute C = e(01100 00000 11000, 10110 00110 10101).


(b) Compute M = d(10010 00010, 10101 01010).
17. Consider the 16-bit 2 round Feistel cipher with = {0, 1} and M = C =
16 = {0, 1}16 . Assume that the key k generates the round keys k , k . The
1 2
round function

f : {0, 1}8 × {0, 1}8 → {0, 1}8

is defined as f (a, b) = a ⊕ b.
(a) Compute C = e(01000110 01001100, k), with k1 = 10000000, k2 =
00000001.
(b) Compute M = d(00000000 01011010, k), with k1 = 00000000, k2 =
00000000.
Chapter 9
Public Key Cryptography

9.1 Introduction to Public Key Cryptography

As we have seen, a cryptosystem is a system of the form

M, C, e, d, Ke , Kd ,

where M is the message space, C is the space of all possible cryptograms, e is the
encryption transformation, d is the decryption transformation, Ke is the encryption
keyspace, and Kd is the decryption keyspace.
More formally, the encryption transformation is a function

e : M × Ke → C.

Indeed, for message M ∈ M and encryption key ke ∈ Ke ,

e(M, ke ) = C ∈ C.

The decryption transformation is a function

d : C × Kd → M.

Given the ciphertext C = e(M, ke ), there is a decryption key kd ∈ Kd so that

d(e(M, ke ), kd ) = M.

If k = ke = kd is a shared secret key, then the cryptosystem is a symmetric


key cryptosystem. In a symmetric key cryptosystem, once we know the encryption
transformation e and the encryption key k, it is “easy” to compute the ciphertext
e(M, k), i.e., e(M, k) can be computed using a polynomial time algorithm.

© Springer Nature Switzerland AG 2022 173


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_9
174 9 Public Key Cryptography

Likewise, if we know the key k (and hence kd ), then it is “easy” to invert e(M, k),
i.e., given the ciphertext C = e(M, k), there is a polynomial time algorithm that
computes the unique M for which e(M, k) = C. One has M = d(C, k) =
d(e(M, k), k).
In public key cryptography, we use a different kind of encryption transformation
e. As in a symmetric key cryptosystem, it should be easy to compute C = e(M, ke )
knowing ke , but unlike a symmetric key cryptosystem, it should be hard to invert
e even with knowledge of ke . In fact, the encryption key ke is made public and is
called the public key; it is known to everyone (including Malice!). It should be
easy however to invert e(M, ke ) if one knows an additional piece of information:
the decryption key kd , which is called the trapdoor or private key.
Security in a public key cryptosystem depends on the secrecy of kd . It is essential
therefore that ke = kd . For this reason, public key cryptosystems are also called
asymmetric key cryptosystems.
The type of encryption function that we want to use in a public key cryptosystem
is called a “one-way trapdoor” function. In order to define such a function, we
introduce the idea of a “negligible” function.

9.1.1 Negligible Functions

Let R+ = {x ∈ R : x > 0}.


Definition 9.1.1 A function r : R+ → R+ is negligible if for each positive
polynomial w(x) ∈ Z[x], there exists an integer l0 ≥ 1 for which r(l) < w(l)
1

whenever l ≥ l0 .
For example, r(x) = 21x is negligible since for any positive polynomial w(x),
there exists l0 for which w(l)/2l < 1 whenever l ≥ l0 . We have

1
2x w(x)
lim = lim = 0,
x→∞ 1 x→∞ 2x
w(x)

by L’Hôpital’s rule. A negligible function goes to 0 faster than the reciprocal of any
positive polynomial.
The concept of a negligible function is related to the notion that efficient
algorithms run in polynomial time. If algorithm A runs in polynomial time as
a function of input size l, then repeating A w(l) times, where w is a positive
polynomial, results in a new algorithm that also runs in polynomial time and hence
is efficient. But when the probability that an algorithm successfully computes a
function value is a negligible function of l, then repeating it a polynomial number
of times will not change that fact.
Proposition 9.1.2 Let w be a positive polynomial. Suppose that the probability that
algorithm A correctly computes a function value f (n) for some instance n of size
9.1 Introduction to Public Key Cryptography 175

l is a negligible function of l. Then the probability that A correctly computes f (n)


at least once when repeated w(l) times is also negligible, i.e., the probability is a
negligible function of l.
Proof Let P = Pr(A(n) = f (n)) for some input n of size l. Let Pr(ξw(l) = i)
denote the probability that A correctly computes the function value exactly i times
when A is repeated w(l) times, cf. Section 2.4, Example 2.4.2.
Then, the probability that A correctly computes the function value at least once
when repeated w(l) times is


w(l) w(l) 
 
w(l) i
Pr(ξw(l) ≥ 1) = Pr(ξw(l) = i) = P (1 − P )w(l)−i .
i
i=1 i=1

We claim that Pr(ξw(l) ≥ 1) is a negligible function of l. To this end, observe that

w(l) 
 
w(l) i
w(l)P ≥ P (1 − P )w(l)−i ,
i
i=1

thus

w(l)P ≥ Pr(ξw(l) ≥ 1). (9.1)

By way of contradiction, suppose that Pr(ξw(l) ≥ 1) is not negligible. Then there


exists a positive polynomial v(x) for which

1
Pr(ξw(l) ≥ 1) ≥
v(l)

for infinite l. Then by (9.1), we have

1
P ≥ ,
w(l)v(l)

for infinite l. Since w(x)v(x) is a positive polynomial, this implies that P is not
negligible, a contradiction. 


9.1.2 One-Way Trapdoor Functions

Definition 9.1.3 An encryption function e(x, ke ) : M → C is a one-way trapdoor


function if
(i) e(M, ke ) can be computed by a polynomial time algorithm.
176 9 Public Key Cryptography

(ii) Given a positive polynomial w and a probabilistic polynomial time algorithm


A, with input C = e(M, ke ), where M is a random element of M of size l, we
have
1
Pr(A(e(M, ke )) = M) < ,
w(l)

for l sufficiently large. Less formally, the probability that A successfully inverts
the function e(x, ke ) is a negligible function of l, even with knowledge of e and
ke .
(iii) However, with an additional piece of information (the trapdoor kd ), e(M, ke )
can be inverted in polynomial time.


Unfortunately, it has not been proven that one-way trapdoor functions exist.
In fact, the existence of such functions is related to deep questions in complexity
theory, see [59, Theorem 6.6].
Nevertheless, there are some very good candidates for one-way trapdoor func-
tions that are used to construct public key cryptosystems. The inputs for these
potential one-way functions will be integers; the size of the integers will be
measured in bits.
Let l ≥ 2. An l-bit prime is a prime number p with 2l−1 + 1 ≤ p ≤ 2l − 1. For
such primes, l = log2 (p) + 1 and so p requires l bits to represent it in binary; the
prime p is of size l.
For example, the 5-bit primes are primes that satisfy 17 ≤ p ≤ 31, and thus, they
are 17, 19, 23, 29, 31. Note that (17)2 = 10001, (19)2 = 10011, (23)2 = 10111,
(29)2 = 11101, and (31)2 = 11111.
A prime p is Mersenne if p = 2l − 1 for l ≥ 2. If p = 2l − 1 is Mersenne, then
l is prime. The binary representation of a Mersenne prime 2l − 1 is 1l ; a Mersenne
prime is an l-bit prime.
Since there are an infinite number of primes (see [47, Theorem 3.1]), there are
l-bit primes for arbitrarily large l ≥ 2.
Here is our first candidate for a one-way trapdoor function.
Let l ≥ 1. Let p and q be distinct primes in which the smaller prime is an l-bit
prime. Let s be an integer with the following properties: 1 < s < (p − 1)(q − 1)
and gcd(s, (p − 1)(q − 1)) = 1. Let n = pq, and let = {0, 1, 2, 3, . . . , n − 1}
denote the set of n letters. Define a function e(x, (s, n)) : → by the rule

e(x, (s, n)) = (x s mod n),

for x ∈ .
Then e(x, (s, n)) is a possible one-way trapdoor function. It can be shown that
conditions (i) and (iii) of Definition 9.1.3 hold:
(i) e(x, (s, n)) can be computed by a polynomial time algorithm.
(iii) e(x, (s, n)) can be inverted in polynomial time with the trapdoor.
9.2 The RSA Public Key Cryptosystem 177

Moreover, it is assumed that condition (ii) of Definition 9.1.3 holds (see


Section 9.3):
(ii) Given a positive polynomial w and a probabilistic polynomial time algorithm
A with input e(x, (s, n)), x ∈R , and output y ∈ , we assume that

1
Pr(A(e(x, (s, n))) = x) <
w(l)

for l sufficiently large, i.e., we assume the probability that A successfully inverts
e(x, (s, n)) = (x s mod n) is negligible, i.e., is a negligible function of l, even
with knowledge of s, n.
Note: here and elsewhere, the notation x ∈R S means that the element x is chosen
uniformly at random from the set S.
The function e(x, (s, n)) : → is the basis for our first example of a public
key cryptosystem.

9.2 The RSA Public Key Cryptosystem

Definition 9.2.1 (RSA Public Key Cryptosystem) Alice wants to


send a secret message to Bob. Let p and q be distinct large prime numbers, and let
s be an integer with

1 < s < (p − 1)(q − 1) and gcd(s, (p − 1)(q − 1)) = 1.

Let n = pq, and let = {0, 1, 2, 3, . . . , n − 1} denote the set of n letters. We let
M = C = ∗ . A message M ∈ M is a finite sequence of letters in . The pair
(s, n) is Bob’s public key (the encryption key) which he publishes for all to see.
Alice looks up Bob’s public key (s, n) and encrypts the message M =
M0 M1 M2 · · · Mr−1 letter by letter to form the ciphertext
C = C0 C1 C2 · · · Cr−1 using the rule

Ci = e(Mi , (s, n)) = (Mis mod n)

for 0 ≤ i ≤ r − 1. She then sends the ciphertext C to Bob who will decrypt.
Bob’s private key (the trapdoor) is the unique integer t with the following
properties:

1 < t < (p − 1)(q − 1) and st ≡ 1 (mod (p − 1)(q − 1)).


178 9 Public Key Cryptography

Decryption of the ciphertext C = C0 C1 · · · Cr−1 proceeds letter by letter using the


formula

Mi = d(Ci , t) = (Cit mod n)

for 0 ≤ i ≤ r − 1.
This cryptosystem is the RSA public key cryptosystem. 

RSA is so named since it was developed by R. Rivest, A. Shamir, and L. Adleman
in 1977. Today, RSA is by far the most widely used public key cryptosystem in the
world.
Proposition 9.2.2 The RSA cryptosystem works.
Proof We show that (8.1) holds. Let M ∈ . Then

d(e(M, (s, n)), t) = d((M s mod n), t)


= ((M s mod n)t mod n)
= ((M st mod n) mod n)
= (M st mod n).

There exists an integer a with st = 1 + a(p − 1)(q − 1), thus

M st ≡ MM a(p−1)(q−1) ≡ M (mod p),

M st ≡ MM a(p−1)(q−1) ≡ M (mod q)

by Fermat’s Little Theorem.


Now both M and M st are solutions to the system of congruences

x ≡ M (mod p)
x ≡ M (mod q).

Thus by the Chinese Remainder Theorem,

M st ≡ M (mod n),

and hence (M st mod n) = (M mod n) = M, as required. 



Example 9.2.3 Bob chooses primes p = 631 and q = 2153. Then n =
(631)(2153) = 1358543. Consequently, = {0, 1, 2, 3, . . . , 1358542} and M =
C = ∗ . Note that (p − 1)(q − 1) = (630)(2152) = 1355760.
To create his public key, Bob chooses s with the properties: 1 < s < 1355760
and gcd(s, 1355760) = 1. His choice is s = 5359. Bob’s public key is now the pair
9.2 The RSA Public Key Cryptosystem 179

(5359, 1358543) which he publishes. (For security, Bob also destroys the primes p
and q.)
Bob then computes his private key by computing the unique integer t that satisfies
1 < t < 1355760 and 5359t ≡ 1 (mod 1355760). In fact, t = 20239. Thus Bob’s
private key is 20239.
Alice now encrypts the message M = 413 7000 letter by letter to yield

C0 = e(M0 , (s, n))


= e(413, (5359, 1358543))
= (4135359 mod 1358543)
= 1311697,

C1 = e(M1 , (s, n))


= e(7000, (5359, 1358543))
= (70005359 mod 1358543)
= 1262363.

Thus C = 1311697 1262363.


Alice then sends the ciphertext C = 1311697 1262363 to Bob who decrypts
letter by letter using his private key t:

M0 = d(C0 , t)
= d(1311697, 20239)
= (131169720239 mod 1358543)
= 413.

M1 = d(C1 , t)
= d(1262363, 20239)
= (126236320239 mod 1358543)
= 7000.

Thus Bob recovers the message M = 413 7000. 



180 9 Public Key Cryptography

9.3 Security of RSA

The security of the RSA cryptosystem depends on the secrecy of Bob’s private key
(the trapdoor). It also depends on the assumption that the RSA encryption function
is a one-way trapdoor function: it is hard to invert e(M, (s, n)) knowing only e and
the public key (s, n). But what evidence do we have that RSA encryption is hard to
invert?
RSA encryption is related to another function that is (supposedly) hard to
invert. Let P = {2, 3, 5, 7, 11, 13, . . . } denote the set of all prime numbers, let
N = {1, 2, 3, . . . }, and define

PMULT : P × P → N,

by the rule

PMULT(p, q) = pq = n.

The inverse of PMULT is the factorization n = pq; factorization is assumed to be


hard to compute.
This is formalized in the following assumption.
The Factoring Assumption (FA) Let w(x) ∈ Z[x] be a positive polynomial, let p
and q be randomly chosen l-bit primes, and let n = pq. Let A be a probabilistic
polynomial time algorithm with input n and output A(n) = (p, q) ∈ P × P, where
P denotes the set of all primes. Then there exists an integer l0 for which

1
Pr(A(n) = (p, q)) <
w(l)

whenever l ≥ l0 .
In other words, the probability Pr(A(n) = (p, q)) as a function of l is a negligible
function of l.
The FA says that the composite n cannot be factored in polynomial time. This is
the basis for the security of RSA.
Proposition 9.3.1 Assume that the RSA cryptosystem has public key (s, n) and
private key t. If n can be factored, then RSA is insecure.
Proof Suppose that n can be factored into the prime numbers p and q. Then the
integer (p − 1)(q − 1) is known. Since gcd(s, (p − 1)(q − 1)) = 1, s is a unit in
Z(p−1)(q−1) , and the private key t can be computed as t = s −1 in U (Z(p−1)(q−1) ).
Indeed, t can be found in polynomial time using the Euclidean algorithm. 

The contrapositive of Proposition 9.3.1 holds.
9.3 Security of RSA 181

Corollary 9.3.2 If RSA is secure, then factoring is hard, that is, there is no
polynomial time algorithm for inverting PMULT.
Unfortunately, we do not know if the converse of Corollary 9.3.2 holds; in other
words, if factoring is hard, does that guarantee that RSA is secure?
In light of Proposition 9.3.1, attacks on RSA attempt to factor the RSA modulus
n = pq. In what follows, we review some standard ways to factor the RSA modulus.
In view of the FA, the algorithms we present necessarily run in non-polynomial time.
To begin with, we can always use the naive approach of algorithm PRIME
(Algorithm√ 4.3.2) to factor n. From Proposition 4.3.1, we√know that n has a prime
factor ≤ n. So we just check each integer j , 2 ≤ √ j ≤ n, to see if it is a divisor
of n. In this manner, we can find a factor of n in O( n) steps.
J. Pollard has given two methods that improve on this naive approach.

9.3.1 Pollard p − 1

Let n be a large composite integer that is a product of primes n = pq, p, q ≥ 3.


Suppose that p has the property that the prime factorization of p − 1 contains many
small primes with small exponents. For instance, suppose p = 4621, so that

p − 1 = 4620 = 22 · 3 · 5 · 7 · 11.

In this case, J. M. Pollard [42] has developed a method for factoring n. We note that
Pollard’s method can be applied to factor an arbitrary integer.
Pollard’s method is based on the observation that since p − 1 is divisible by only
small primes with small exponents, there exists a not-too-large integer m so that
(p − 1) | m!. For example, if p − 1 = 4620 = 22 · 3 · 5 · 7 · 11, then we can choose
m = 11, and we see that p − 1 divides 11!. The integer m provides an upper bound
on the number of iterations in Pollard’s algorithm.
The core idea behind Pollard’s p − 1 algorithm can be summarized as follows.
Since (p − 1) | m!, there exists an integer k so that (p − 1)k = m!. Since p ≥ 3,
gcd(2, p) = 1, and so by Fermat’s Little Theorem,

2m! ≡ 2(p−1)k ≡ (2p−1 )k ≡ 1 (mod p).

Thus p | (2m! − 1). Since gcd(p, n) = p,

p ≤ gcd(2m! − 1, n) ≤ n.

If q  (2m! − 1), then

p ≤ gcd(2m! − 1, n) < n,

and thus p = gcd(2m! − 1, n) is a prime factor of n.


182 9 Public Key Cryptography

On the other hand, if q | (2m! − 1), then gcd(2m! − 1, n) = n, and we do not


recover the factor p of n.
We can reduce the chance that q | (2m! −1) to near zero by requiring that (q −1) 
m!. For then, by the division algorithm, m! = (q − 1)l + r, 0 < r < q, hence

2m! ≡ 2(q−1)l 2r ≡ 2r (mod q).

So if q | (2m! − 1), then r would have to be a multiple of the order of 2 in U (Zq ),


which is unlikely.
Here is Pollard’s p − 1 algorithm.
Algorithm 9.3.3 (POLLARD_p − 1)
Input: a large integer n, which is the product of primes n = pq
with p, q ≥ 3, a not-too-large integer bound m
Output: a prime factor p
Algorithm:
for k = 2 to m do
dk ← gcd(2k! − 1, n)
if 1 < dk < n, then output p = dk
next k



Example 9.3.4 We use POLLARD_p − 1 to factor n = 21436819. We set the
bound m. After 10 iterations of the for-next loop, we arrive at the computations (the
integers 2k! − 1 are reduced modulo 21436819)

gcd(22! − 1, 21436819) = gcd(3, 21436819) = 1


gcd(23! − 1, 21436819) = gcd(63, 21436819) = 1
..
.
gcd(29! − 1, 21436819) = gcd(10391831, 21436819) = 1
gcd(210! − 1, 21436819) = gcd(3812898, 21436819) = 1
gcd(211! − 1, 21436819) = gcd(12162472, 21436819) = 4621.

The algorithm then outputs the prime factor p = 4621. The other prime factor is
q = 21436819/4621 = 4639. 

The Pollard p − 1 algorithm ran efficiently in Example 9.3.4 because p − 1 =
4620 = 22 · 3 · 5 · 7 · 11 is a product of small primes with small exponents. Moreover,
we found a non-trivial factor since q = 4639  (211! − 1).
9.3 Security of RSA 183

It was highly unlikely that q = 4639 | (211! − 1) since q − 1 = 4638  11!;


q − 1 = 4638  11! because 4638 has prime factor decomposition 2 · 3 · 773 and
hence is not a product of primes ≤ 11.
Pollard p − 1 will not work efficiently in the case that the primes in the product
n = pq are so that both p − 1 and q − 1 have large prime factors or large prime
powers in their decompositions.
Example 9.3.5 We use POLLARD_p − 1 to factor n = 1219. We set the bound m.
After 10 iterations of the for-next loop, we arrive at the computations (the integers
2k! − 1 are reduced modulo 1219)

gcd(22! − 1, 1219) = gcd(3, 1219) = 1


gcd(23! − 1, 1219) = gcd(63, 1219) = 1
..
.
gcd(29! − 1, 1219) = gcd(277, 1219) = 1
gcd(210! − 1, 1219) = gcd(1207, 1219) = 1
gcd(211! − 1, 1219) = gcd(575, 1219) = 23.

The algorithm then outputs the prime factor p = 23. The other prime factor is
q = 1219/23 = 53. 

The computation in Example 9.3.5 is inefficient; it has a large run time relative
to the size of the input. In this case the prime factor decompositions are

p − 1 = 22 = 2 · 11,

q − 1 = 52 = 22 · 13,

which both contain large prime factors relative to the size of p and q.

9.3.2 Pollard ρ

Along with the p − 1 method, J. M. Pollard [43] has developed a probabilistic


algorithm for factoring n = pq; there is a high probability that the algorithm will
output a prime factor of n.
184 9 Public Key Cryptography

Here is Pollard’s algorithm.


Algorithm 9.3.6 (POLLARD_ρ)
Input: a large integer n, which is the product of primes n = pq
with p, q ≥ 3, an “average” function f (x) = x 2 + 1,
a not-too-large integer bound m
Output: a prime factor p
Algorithm:
x0 ← 2
for k = 1 to m do
xk ← (f (xk−1 ) mod n)
next k
for k = 1 to m do
dk = gcd(x2k − xk , n)
if 1 < dk < n then output p = dk
next k



The algorithm is based on an application of Proposition 2.3.1.
Proposition 9.3.7 Let n be a large composite integer of the form n =√pq, where p
and q are primes, p, q ≥ 3. Let C > 0 be a real number. Let m = 1+ 2pC. Then
randomly choosing a sequence of m terms of Zp (with replacement) guarantees that
a collision occurs with probability at least 1 − e−C .
Proof Take S = Zp and N = p in Proposition 2.3.1. Let y1 , y2 , . . . , ym be a
sequence of terms in Zp . Then the probability that there is a collision is

m−1  
i
1− 1− ,
p
i=1

−m(m−1) √ √
m(m−1) 2pC· 2pC
which by (2.3) is at least 1 − e 2p . Now, 2p > 2p = C, thus
−m(m−1)
e 2p < e−C , and so
−m(m−1)
1−e 2p > 1 − e−C .



Randomly choosing a sequence of terms in Zp is equivalent to choosing an
“average” function f : Zp → Zp and seed x0 , and defining a sequence of terms
iteratively:

x0 , x1 = f (x0 ), x2 = f (f (x0 )), x3 = f (f (f (x0 ))), . . . (9.2)

(In POLLARD_ρ, we choose f (x) = x 2 + 1 with seed x0 = 2.)


9.3 Security of RSA 185

In the sequence {xi }i≥0 defined as such, we find the point where the first collision
occurs, i.e., we find the smallest j , j > i for which

xi ≡ xj (mod p). (9.3)

Proposition 9.3.7 says that it is very likely that this first collision occurs in the first

m = O( p) terms of {xi } modulo p.
Now (9.3) tells us that the period of the sequence {xi }i≥0 modulo p is j −i. Since
j − i ≤ j , there exists a largest integer s for which s(j − i) ≤ j . We claim that
i ≤ s(j − i). For if not, then i > s(j − i), or (s + 1)i > sj , or (s + 1)(j − i) < j ,
a contradiction. Thus,

i ≤ s(j − i) ≤ j.

Put k = s(j − i). Then

x2k ≡ xk (mod p)

for some k, whose value is O( p).
POLLARD_ρ finds the first collision modulo p, i.e.,

x2k ≡ xk (mod p),

which is not a collision modulo n, i.e.,

x2k ≡ xk (mod n).

Then 1 < gcd(x2k − xk , n) < n, and we have found the factor p.


It follows from Proposition 9.3.7 that Pollard ρ will likely compute a prime factor

p of n after O( p) iterations of the for-next loop. Of√course, in practice, we will
not know the value of p. By Proposition 4.3.1, √ p ≤ n, and thus Pollard ρ will
likely compute a prime factor p of n after O( 4 n) iterations of the for-next loop.
Example 9.3.8 We use Pollard_ρ to factor n = 8131. With x0 = 2 and f (x) =
x 2 + 1, the sequence of terms modulo 8131 is
2, 5, 26, 677, 2994, 3675, 35, 1226, 6973, 7481, 7820, 7281, 6973, 7481,
7820, 7281, 6973, . . .
We next compute

gcd(x2 − x1 , 8131) = gcd(26 − 5, 8131) = 1


gcd(x4 − x2 , 8131) = gcd(2994 − 26, 8131) = 1
gcd(x6 − x3 , 8131) = gcd(35 − 677, 8131) = 1
gcd(x8 − x4 , 8131) = gcd(6973 − 2994, 8131) = 173.
186 9 Public Key Cryptography

The algorithm then outputs the prime factor p = 173. The other prime factor is
q = 8131/173 = 47. 

Remark 9.3.9 In Example 9.3.8, the terms taken modulo 173 and 47, respectively,
are
2, 5, 26, 158, 53, 42, 35, 15, 53, 42, 35, 15, 53, 42, 35, 15, 53, . . .
2, 5, 26, 19, 33, 9, 35, 4, 17, 8, 18, 43, 17, 8, 18, 43, 17, . . .
For these moduli, the sequences x0 , x1 , x2 , . . . are eventually periodic with
period 4 (see Section 11.1). Pollard’s method is called Pollard “ρ” because the
periodicity of these sequences suggests the shape of the Greek letter ρ when typed.
For instance, the non-periodic initial sequence 2, 5, 26, 158 corresponds to the tail
of “ρ,” while the periodic part

42, 35, 15, 53, 42, 35, 15, 53, 42, 35, 15, 53, . . .

corresponds to the loop of “ρ.” 



Here is a GAP program for finding a prime factor of n using POLLARD_ρ.
x:=List([1..m]);
x[1]:=2;
for k in [2..m] do
x[k]:=(x[k-1]ˆ 2+1) mod n;
od;
d:=List([1..m]);
for j in [2..QuoInt(m,2)] do
d[j-1]:=GcdInt(x[2*j-1]-x[j],n);
if 1 < d[j-1] and d[j-1] < n then
Print("a prime factor of"," ", n," ","is"," ",d[j-1]);
break;
fi;
od;

In Example 9.3.8, we used the function f : Z8131 → Z8131 defined as f (x) =


x 2 + 1 to generate the sequence x0 , x1 , x2 , . . . ; f is an average “typical” function;
it is not a bijection. For instance,

f (1) = f (518) = f (7613) = f (8130) = 2.

In fact, if we had chosen a bijection f : Z8131 → Z8131 , it would not be as useful


for Pollard ρ.
Proposition 9.3.10 Suppose S is a finite set with N = |S|. Let f : S → S be a
permutation of S (a bijection from S to itself). Let x0 ∈ S and define a sequence xi =
f (xi−1 ) for i ≥ 1. Let k be the smallest integer, k ≥ 1, for which x0 , x1 , . . . , xk−1
are distinct and xk = xj for some 0 ≤ j ≤ k − 1.
(i) Pr(k = 1) = Pr(k = 2) = Pr(k = 3) = · · · = Pr(k = N) = 1
N.
9.3 Security of RSA 187

(ii) The expected value of k over all permutations f : S → S is (N +1)/2 = O(N).


Proof See [32, p. 198, 5(a),(b)]. 

Thus if f : Zn →
√ Zn is a bijection in Pollard ρ, then the algorithm √
would likely
terminate after O( n) iterations, which is much less efficient than O( 4 n).

9.3.3 Difference of Two Squares

Let n = pq be a product of distinct odd primes p, q, p > q. We discuss other


algorithms for factoring n. These methods depend on the familiar difference of two
squares factorization

x 2 − y 2 = (x + y)(x − y).

Fermat Factorization

Our first algorithm is called Fermat factorization. The idea behind this algorithm is
to find an integer x so that x 2 −n is a square of some integer y. For then, x 2 −n = y 2 ,
thus, n = x 2 − y 2 = (x + y)(x − y), and so p = x + y and q = x − y are the prime
factors of n.
Since we require
√ that x 2 −n is a square of an integer, we can assume that x√
2 −n ≥

0, thus x ≥ n. Thus the algorithm begins the search process with x =  n. If
x 2 − n is a square, then we are done, else we check whether (x + 1)2 − n is a square,
and so on.
The algorithm will always succeed in finding an x so that x 2 − n is a square.
Indeed, if x = (p + q)/2, then
 2
p+q
x −n =
2
−n
2
1 2
= p + 2pq + q 2 − pq
4
1 1 1
= p2 − pq + q 2
4 2 4
 2
p−q
= .
2

We have 1 < p and q < n and so the largest x we have to check is ≤ n.


188 9 Public Key Cryptography

Here is the algorithm.


Algorithm 9.3.11 (FERM_FACT)
Input: an integer n, which is the product of primes n = pq
with p, q ≥ 3.
Output: a prime factor p
Algorithm: √
for i =  n to n do
if i 2 − n is a square of an integer j , then
output p = i + j
next i



Example
√ 9.3.12 We use FERM_FACT to factor n = 1012343. We start with i =
 1012343 = 1007 and check

10072 − 1012343 = 1706, (not a square)


10082 − 1012343 = 3721 = 612 (is a square).

Thus, p = i + j = 1008 + 61 = 1069 is a prime factor of 1012343; the other factor


is 1008 − 61 = 947. 

Example
√ 9.3.13 We use FERM_FACT to factor n = 1967. We start with i =
 1967 = 45 and check

452 − 1967 = 58, (not a square)


462 − 1967 = 149, (not a square)
472 − 1967 = 242, (not a square)
..
.
1432 − 1967 = 18482, (not a square)
1442 − 1967 = 18769 = 1372 (is a square).

Thus p = i + j = 144 + 137 = 281 is a prime factor of 1967; the other factor is
144 − 137 = 7. 

Proposition 9.3.14 The running time of FERM_FACT is O(n).
Proof The algorithm will output a factor in at most
√ √
n −  n + 1 > (p + q)/2 −  n + 1

iterations, and thus FERM_FACT runs in time O(n). 



9.3 Security of RSA 189


If p ≈ q, then (p + q)/2 ≈ p ≈ n. So when p and q are close in value, the
algorithm finds a factor of n very quickly, as we have seen in Example 9.3.12. On
the other hand, if p and q are far apart, then the algorithm is quite inefficient, as
shown in Example 9.3.13.
Our next algorithm improves on this factoring method.

Modular Fermat Factorization

The idea is to find integers x and k so that x 2 − kn is a square of some integer y.


Then, x 2 − kn = y 2 and so

x 2 ≡ y 2 (mod n). (9.4)

We then have

(x + y)(x − y) = kn,

and computing gcd(n, x + y) and gcd(n, x − y) will yield the prime factors of n.
The algorithm, which we call modular Fermat factorization, is a systematic
way of solving the congruence (9.4). It uses the notion of a “smooth” integer. Let
B ≥ 2 be a real number. An integer m ≥ 2 is B-smooth if each prime factor of m
is less than or equal to B. For example, 16 is 2-smooth and n! is n-smooth for all
n ≥ 2.
For x ∈ R+ , let π(x) denote the number of prime numbers ≤ x, see [47, p.
72, Definition]. For later use, we state a famous result ([47, Theorem 3.4]), which
allows us to approximate the value of π(x).
Theorem 9.3.15 (Prime Number Theorem) For x ∈ R+ ,

π(x)
lim = 1.
x→∞ x ln(x)

Here is the algorithm.


Algorithm 9.3.16 (MOD_FERM_FACT)
Input: an integer n, which is the product of primes n = pq
with p, q ≥ 3.
Output: a prime factor p
Algorithm:
190 9 Public Key Cryptography

Step 1. Choose a bound B ≥ 0 and find a sequence of integers m1 , m2 , m3 , . . .


for which (m2i mod n) is B-smooth. Once π(B) such residues have been found, form
a system (re-indexing the mj if necessary):
⎧ e e e

⎪ m2 ≡ q11,1 q21,2 · · · qk 1,k (mod n)
⎪ 1



⎪ 2

⎪ e2,1 e2,2 e2,k
⎨ m2 ≡ q1 q2 · · · qk (mod n)

(9.5)

⎪ ..




.




⎩ e e e
m2r ≡ q1r,1 q2r,2 · · · qk r,k (mod n)

for some k, r, with k ≤ π(B) ≤ r. Here q1 , q2 , · · · qk are B-smooth primes and


ea,b ≥ 0 for 1 ≤ a ≤ r, 1 ≤ b ≤ k.
Step 2. From (9.5), we take a product of the m2a , 1 ≤ a ≤ r,

(m21 )d1 (m22 )d2 · · · (m2r )dr ,

da ∈ {0, 1}, so that this product satisfies

(m21 )d1 (m22 )d2 · · · (m2r )dr ≡ y 2 (mod n),

for some (y mod n). Then

(m21 )d1 (m22 )d2 · · · (m2r )dr ≡ (md11 )2 (md22 )2 · · · (mdr r )2

≡ (md11 md22 · · · mdr r )2


≡ y 2 (mod n).

Thus x = md11 md22 · · · mdr r yields a solution to (9.4).


Step 3. We check gcd(n, x + y) and gcd(n, x − y) to obtain the factors of n. 

Example 9.3.17 We use MOD_FERM_FACT to factor n = 115993.
We set B = 7 and find the following congruences:

(50948)2 ≡ 2 · 3 · 52 · 72 (mod 115993)


(8596)2 ≡ 3 · 52 · 72 (mod 115993)
(46970)2 ≡ 27 · 3 · 74 (mod 115993).
9.3 Security of RSA 191

We have

(50948)2 (46970)2 ≡ (50948 · 46970)2 (mod 115993)


≡ (91970)2 (mod 115993)
≡ 28 · 32 · 52 · 76 (mod 115993)
≡ (24 · 3 · 5 · 73 )2 (mod 115993)
≡ (82320)2 (mod 115993).

And so, the factors are

gcd(115993, 91970 − 82320) = 193 and gcd(115993, 91970 + 82320) = 601.



How efficient is MOD_FERM_FACT? There is an important theorem of E. R.
Canfield et al. [9] that allows us to complete Step 1 in a reasonable (yet non-
polynomial) amount of time.
For an integer n ≥ 1, define
1/2 (ln(ln(n)))1/2
L(n) = e(ln(n)) .

For n ≥ 2, let (n, B) be the number of B-smooth integers j with 2 ≤ j ≤ n.


For instance, (10, 3) = 4 since 2, 3, 4, and 9 are the only 3-smooth integers with
2 ≤ j ≤ 10.
Theorem 9.3.18 (Canfield, Erdős, and Pomerance) Let c, 0 < c < 1, and then

(n, L(n)c ) 1
lim = 1
.
n→∞ n L(n) 2c



√1
Corollary 9.3.19 Let n be a large integer and let B = L(n) 2 . In a random
√ √1
sequence of L(n) 2 integers modulo n, we expect to find π(L(n) 2 ) integers that
√1
are L(n) 2 -smooth.
Proof For any c, 0 < c < 1, the probability that a random integer modulo n is
L(n)c -smooth is

(n, L(n)c ) 1
≈ 1
.
n L(n) 2c
192 9 Public Key Cryptography

And so, we need to check approximately


1
π(L(n)c )L(n) 2c (9.6)

integers to guarantee that π(L(n)c ) of them are L(n)c -smooth.


By the Prime Number Theorem,

1 L(n)c 1 1
π(L(n)c )L(n) 2c ≈ · L(n) 2c ≈ L(n)c+ 2c .
ln(L(n)c )
1
So we need to check L(n)c+ 2c integers to find π(L(n)c ) integers that are L(n)c -
smooth.
1
Using some elementary calculus,√we find that L(n)c+ 2c is minimized when c =
√1 , and its minimum value is L(n) 2 . 

2
Proposition 9.3.20 Let n = pq be product primes p, q ≥ 3. Then
MOD_FERM_FACT factors n in subexponential time
1/2 (log (log (n)))1/2
O(2c(log2 (n)) 2 2 ),

where c > 0 is a small constant.


Proof (Sketch) We compute the amount of time needed to complete Step 1. If
an integer (m mod n) is chosen at random, then the integer (m2 mod√n) is essen-
tially random. By Corollary 9.3.19, in a random sequence of L(n) 2  residues
√1 √1
(m2 mod n), we expect to find π(L(n) 2 )√residues (m2 mod n) that are L(n) 2 -
smooth. Thus Step 1 takes time O(L(n) 2 ). Thus the entire algorithm runs in
1/2 1/2
subexponential time O(2c(log2 (n)) (log2 (log2 (n))) ), where c > 1 is a small constant.


Remark 9.3.21 We can reduce the time required to complete Step 1 down to
O(L(n)) using the quadratic sieve developed by C. Pomerance [44], [25, Section
3.7.2]. Thus the total running time of Algorithm 9.3.16 can be reduced to
1/2 (log (log (n)))1/2
O(2(log2 (n)) 2 2 ).

Assuming that a computer can perform at most 240 basic operations in a


reasonable amount of time, Algorithm 9.3.16, using the quadratic sieve, can factor
numbers n = pq up to size 2206 . This follows since
√ 206 206
240 ≈ 2 log2 (2 ) log2 (log2 (2 )) .
9.4 The ElGamal Public Key Cryptosystem 193

Thus, to ensure security in RSA, the chosen primes must be at least 206-bit
primes. In fact, in RSA-2048 two 1024-bit primes are chosen, and in RSA-4096,
two 2048-bit primes are used.

9.4 The ElGamal Public Key Cryptosystem

Our second public key cryptosystem is based on the discrete exponential function.
Definition 9.4.1 (Discrete Exponential Function) Let p be a random l-bit prime,
and let g be a primitive root modulo p, i.e., g = U (Zp ). Define a function

DEXPp,g : U (Zp ) → U (Zp )

by the rule

DEXPp,g (x) = (g x mod p), x ∈ U (Zp );

DEXPp,g is the discrete exponential function.


Since g is a primitive root modulo p, g generates the cyclic group U (Zp ), which
has order p − 1. Thus DEXPp,g is a 1-1 correspondence, and so the inverse of
DEXPp,g exists.
Definition 9.4.2 (Discrete Logarithmic Function) The inverse of
DEXPp,g is the function

DLOGp,g : U (Zp ) → U (Zp ),

defined as follows: for y ∈ U (Zp ), DLOGp,g (y) is the unique element x ∈ U (Zp )
for which y = (g x mod p); DLOGp,g is the discrete logarithm function.
For example, let p = 31. Then g = 3 is a primitive root modulo 31. We
have DEXP31,3 (17) = 22 and DLOG31,3 (22) = 17; DEXP31,3 (6) = 16 and
DLOG31,3 (16) = 6.
If p is an l-bit prime, then we can consider DEXPp,g and DLOGp,g as functions
on [0, 1]l ; passing to base 2 expansions yields

DEXPp,g : [0, 1]l → [0, 1]l ,

DLOGp,g : [0, 1]l → [0, 1]l .

For example, for the 5-bit prime 31, we have DEXP31,3 (10001) = 10110 and
DLOG31,3 (10000) = 00110.
194 9 Public Key Cryptography

Let p be a random l-bit prime, and let g be a primitive root modulo p. We assume
that it is “hard” to compute DLOGp,g (y), where y is a randomly chosen element of
U (Zp ). More formally, we assume the following.
The Discrete Logarithm Assumption (DLA) Let w(x) ∈ Z[x] be a positive
polynomial, let p be a randomly chosen l-bit prime, let g be a primitive root modulo
p, and let y ∈R U (Zp ). Let Ap,g be a probabilistic polynomial time algorithm
dependent on p, g with input y and output Ap,g (y) ∈ U (Zp ). Then there exists an
integer l0 ≥ 1 for which

 1
Pr Ap,g (y) = DLOGp,g (y) < ,
w(l)

whenever l ≥ l0 . In other words, as a function of l, the probability



Pr Ap,g (y) = DLOGp,g (y)

is a negligible function of l.
Given random x ∈ U (Zp ), the DLA says that there is no probabilistic
polynomial time algorithm that inverts DEXPp,g (x), i.e., finds y ∈ U (Zp ) so that
DLOGp,g (y) = x. There is no polynomial time algorithm that computes DLOGp,g .
The DLA (if true) ensures the security of our next public key cryptosystem.
Definition 9.4.3 (ElGamal Public Key Cryptosystem) Alice wants to send a
secret message to Bob. Let p be a prime number, let g be a primitive root modulo
p, and let x ∈ U (Zp ). Let = {0, 1, 2, 3, . . . , p − 1} denote the set of p letters.
Let M = ∗ . A message is a finite sequence of letters
M = M0 M1 · · · Mr−1 . Bob’s public key is the triple (p, g, (g x mod p)) and his
private key is x ∈ U (Zp ).
Using Bob’s public key, Alice encrypts the message M = M0 M1 · · · Mr−1 as
follows: she chooses an element y ∈ U (Zp ) at random and computes

h = (g y mod p), and Ni = (Mi (g x )y mod p),

for 0 ≤ i ≤ r − 1. The encryption transformation is

Ci = e(Mi , (p, g, (g x mod p))) = (h, Ni ),

for 0 ≤ i ≤ r − 1. The resulting ciphertext is

C = C0 C1 · · · Cr−1 = (h, N0 )(h, N1 ) · · · (h, Nr−1 ).

Alice then sends the ciphertext C to Bob.


9.4 The ElGamal Public Key Cryptosystem 195

Bob decrypts the message C by computing

d(Ci , x) = d((h, Ni ), x) = ((h−x Ni ) mod p),

for 0 ≤ i ≤ r − 1.
This is the ElGamal public key cryptosystem. 

Here is an example of the ElGamal cryptosystem.
Example 9.4.4 Let p = 29. Then g = 2 is a primitive root modulo 29. Let =
{0, 1, 2, 3, . . . , 28} denote the set of 29 letters. Let x = 10 ∈ U (Z29 ). Note that
210 ≡ 9 (mod 29). Bob’s public key is (29, 2, 9) and his private key is x = 10.
With the choice of y = 5, Alice encrypts C A T ↔ 2 0 19 as follows:

C0 = e(2, (29, 2, 9)) = ((25 mod 29), ((2 · 95 ) mod 29)) = (3, 10),

C1 = e(0, (29, 2, 9)) = ((25 mod 29), ((0 · 95 ) mod 29)) = (3, 0),

C2 = e(19, (29, 2, 9)) = ((25 mod 29), ((19 · 95 ) mod 29)) = (3, 8).

So C = (3, 10)(3, 0)(3, 8). Alice sends C to Bob who decrypts to obtain

M0 = d((3, 10), 10) = ((3−10 · 10) mod 29) = 2,

M1 = d((3, 0), 10) = ((3−10 · 0) mod 29) = 0,

M2 = d((3, 8), 10) = ((3−10 · 8) mod 29) = 19.

Thus Bob recovers the message 2 0 19 ↔ C A T.


In another communication, Alice has used y = 17 to encrypt a message as C =
(21, 28)(21, 3). She then sends C to Bob. To decrypt, Bob computes

M0 = d((21, 28), 10) = ((21−10 · 28) mod 29) = ((22 · 28) mod 29) = 7,

M1 = d((21, 3), 10) = ((21−10 · 3) mod 29) = ((22 · 3) mod 29) = 8.

And so, M = 7 8 ↔ H I. 

Malice knows Bob’s ElGamal public key (p, g, (g x
mod p)). Yet the DLA (if
true) guarantees that he cannot compute Bob’s trapdoor x in polynomial time. Mal-
ice can still attack ElGamal by finding Bob’s trapdoor x = DLOGp,g (g x mod p)
using non-polynomial methods. In Section 12.2, we will discuss several non-
polynomial time algorithms that compute DLOGp,g .
196 9 Public Key Cryptography

9.5 Hybrid Ciphers

The most widespread use of public key cryptosystems (such as RSA) is as a


component of a hybrid cipher, a cryptosystem in which a public key cryptosystem
is used together with a symmetric cryptosystem. In a hybrid cipher, a public key
cryptosystem is used to encrypt the symmetric key used in a companion symmetric
cryptosystem.
Definition 9.5.1 (Hybrid Cipher) Alice and Bob agree to use the symmetric
cryptosystem

M, C, e, d, K

and the public key cryptosystem

M , C  , e , d  , Ke , Kd  

for their hybrid cipher.


Step 1. Alice encrypts message M using the symmetric cryptosystem to yield the
ciphertext C = e(M, k).
Step 2. Alice encrypts the symmetric key k using the public key cryptosystem to
give the ciphertext c = e (k, ke ), ke ∈ Ke . The package (C, c) is then
sent to Bob.
Step 3. Bob uses his trapdoor kd  to obtain the symmetric key k = d  (c, kd  ) and
then decrypts the ciphertext: M = d(C, k). 

This is the so-called (key encapsulation mechanism/data encapsulation mech-
anism (KEM/DEM) approach, see [56, Section 16.3].
Example 9.5.2 In this hybrid cipher, we let

M, C, e, d, K

be the Vigenère cipher with n = 10 and symmetric key k = 2619. We let

M , C  , e , d  , Ke , Kd  

be RSA; Bob’s public key is (s, n) = (5359, 1358543) and his trapdoor is t =
20239, see Example 9.2.3.
9.6 Symmetric vs. Public Key Cryptography 197

Alice encrypts the message M = 3343001906 as

C = 5952262822 = e(3343001906, 2619).

Alice then encrypts k = 2619 as

c = e (2619, (5359, 1358543)) = (26195359 mod 1358543) = 455068.

Next, Alice sends the package (C, c) = (5952262822, 455068) to Bob.


Bob first decrypts c with his trapdoor:

k = d  (455068, 20239) = (45506820239 mod 1358543) = 2619

and then decrypts C to obtain

M = d(5952262822, 2619) = 3343001906.



Remark 9.5.3 As noted in the hybrid cipher, plaintext English messages are
generally not encrypted with RSA (or other public key cryptosystems). Public key
cryptography is used almost exclusively to encrypt keys. Thus, we do not care as
much about the unicity distance of RSA or other public key cryptosystems, though
the unicity distance can be computed in theory. 


9.6 Symmetric vs. Public Key Cryptography

One important advantage of public key cryptography over symmetric cryptography


lies in their use in communication networks.
Suppose that a communication network consists of N ≥ 2 principals (or
vertices). Each of the N principals wishes to communicate securely with each of
the other N − 1 principals using a symmetric key cryptosystem.
In order to establish a shared secret key, each principal must make an arrange-
ment to meet each of the other principals to exchange the key. This involves a total
of N(N − 1)/2 trips (see Section 9.7, Exercise 24).
For example, suppose that the network consists of N = 6 principals, listed as
A, B, C, D, E, and F . In this case, (6 · 5)/2 = 15 trips have to be made to exchange
15 pairs of keys. The network is modeled by the complete graph on 6 vertices (see
Figure 9.1).
For large N, however, it is not feasible to make these trips, for instance, if N =
105 , then 4999950000 trips are required.
198 9 Public Key Cryptography

Fig. 9.1 Communication


kAB
network: 6 principals, using A B
symmetric keys: kAB is the
shared key between Alice and kAC
kFA kBC
Bob, and so on
kFB kAD
kFC
F C
kAE kBD
kEB

kFE
kCD
kEC kFD

E D
kED

Fig. 9.2 Communication


network: N principals, using A 1, A 2,
public keys priv(A1) priv(A2)

pub( A1) A 3,
AN ,
pub(A2) priv(A3)
priv(AN)

pub( AN)

A 5, A 4,
priv(A5) priv(A4)

On the other hand, if principals A1 , A2 , . . . , AN employ a public key cryptosys-


tem, then they need only to establish N public key/private key pairs, denoted as

(pub(A1 ), priv(A1 )), (pub(A2 ), priv(A2 )) , . . . , (pub(AN ), priv(AN )).

The network is now modeled by a hub-and-spoke diagram and is greatly


simplified (see Figure 9.2). Note that the hub (or center node) is the listing of all
public keys.
To send a secure message M to Bob (= A2 ), Alice (= A1 ) first looks up
Bob’s public key pub(A2 ) in the center node. Alice then encrypts to form C =
e(M, pub(A2 )) and sends C to Bob. Bob then decrypts with his private key priv(A2 )
to yield M = d(C, priv(A2 )).
Here is a general comparison between symmetric key and public key cryptogra-
phy.
Advantages of Symmetric Cryptosystems
1. The encryption and decryption transformations in symmetric key cryptosystems
are simple to implement and allow for very fast encryption/decryption.
9.7 Exercises 199

2. Symmetric key encryption transformations can be composed or iterated to


produce stronger ciphers (as in the Feistel cipher and DES).
3. Symmetric key cryptosystems can provide perfect secrecy as in the Vernam
cipher.
Disadvantages of Symmetric Cryptosystems
1. In a symmetric key cryptosystem, the shared key must be kept secret by both the
sender and the receiver.
2. In a large communication network, there are many shared keys to be managed;
the network graph is complex.
Advantages of Public Key Cryptosystems
1. In a public key cryptosystem, only the private key needs to be kept secret.
2. In a large communication network using a public key cryptosystem, there are
fewer keys to be managed; the network graph is simplified.
3. Public key cryptosystems can be employed as digital signature schemes (see
Chapter 10).
Disadvantages of Public Key Cryptosystems
1. Encryption and decryption is not as fast or efficient as in a symmetric key
cryptosystem.
2. No public key cryptosystem has been proven to achieve perfect secrecy.
3. Public key encryption transformations are modeled on the concept of a one-
way trapdoor function. Unfortunately, one-way trapdoor functions have not been
proven to exist.

9.7 Exercises

1. Let f : R+ → R+ be the function defined as f (x) = ln(x) 1


. Determine whether
f is a negligible function of x.
2. Let f : R+ → R+ be the function defined as f (x) = 2xx . Prove that f is
negligible.
3. Let M, C, e, d, Ke , Kd  denote the RSA cryptosystem with public key
(s, n) = (21, 4981), private key t = 445, and alphabet
= {0, 1, 2, . . . , 4980} with M = C = {0, 1, 2, . . . , 4980}∗ .
(a) Compute C = e(352, (21, 4981)).
(b) Verify that 352 = d(e(352, (21, 4981)), 445).
4. Let M, C, e, d, Ke , Kd  denote the RSA cryptosystem with public key
(s, n) = (631, 7991). Suppose that Malice has obtained (somehow) the prime
factorization 7991 = 61 · 131. How can Malice obtain the private key t? Find
the value of the private key.
200 9 Public Key Cryptography

5. Let M, C, e, d, Ke , Kd  denote the RSA cryptosystem with public key


(s, n) = (17, 3977) and private key t = 2033.
(a) Compute C = e(65, (17, 3977)).
(b) Show that the size of the private keyspace |Kd | must be smaller than 3977.
6. Create your own RSA public key (s, n) and private key t pair in which the
chosen primes satisfy p, q > 100.
7. Why cannot (should not) the primes p, q in RSA be identical?
8. Use Pollard’s p − 1 algorithm to factor 689.
9. Use Pollard’s ρ algorithm to factor 36287.
10. Use modular Fermat factorization to factor 198103.
11. Let p be a prime number of the form p = 2m + 1 for some m > 0. Prove that
k
m = 2k for some k ≥ 0. Primes of the form 22 + 1, k ≥ 0, are Fermat primes.
(It is not known whether there are an infinite number of Fermat primes; in fact,
4
the largest known Fermat prime is 22 + 1 = 65537.)
12. Discuss the Pollard p − 1 factorization method in the case that n = pq, where
p and q are Fermat primes.
k
13. Suppose that n = pq is a product of a Fermat prime p = 22 + 1, k ≥ 0, and
some other prime q = p. Let x0 be a primitive root modulo p. Suppose we use
Pollard ρ with f (x) = x 2 (instead of f (x) = x 2 + 1) and seed x0 to factor n.
Show that Pollard ρ factors n in polynomial time.
14. Find the minimal number of terms that need to be selected at random from the
residue ring Z43 (with replacement) so that the probability of a collision is at
least 0.95.
15. Let f : Z43 → Z43 be the function defined as f (x) = ((x 2 + 1) mod 43).
Consider the sequence defined as xi = f (xi−1 ) with seed x0 = 2. Find the
smallest j so that xi ≡ xj (mod 43), j > i.
16. In this exercise we introduce the Cocks–Ellis public key cryptosystem which
actually predates RSA.
Cocks–Ellis public key cryptosystem
As always, Alice wants to send a secret message to Bob. Let p and q be large
prime numbers for which p  q − 1 and q  p − 1. Let n = pq, and let
= {0, 1, 2, 3, . . . , n − 1} denote the set of n letters. We let M = C = ∗ . A
message M ∈ M is a finite sequence of letters in .
The integer n is Bob’s public key (the encryption key) which he publishes
for all to see. Using Bob’s public key, Alice encrypts the message M =
M1 M2 M3 · · · Mr letter by letter to form the ciphertext C = C1 C2 C3 · · · Cr
using the rule

Ci = e(Mi , n) = (Min mod n).

She then sends the ciphertext C to Bob.


Bob’s private key is a 6-tuple of integers

kd = (p, q, r, s, u, v),
9.7 Exercises 201

where p and q are primes as above, and r, s, u, and v satisfy the conditions:
pr ≡ 1 (mod q − 1), qs ≡ 1 (mod p − 1), pu ≡ 1 (mod q), and qv ≡ 1
(mod p).
Decryption of the ciphertext C = C1 C2 · · · Cr proceeds letter by letter using
the decryption function

d(x, (p, q, r, s, u, v)) : C → M,

defined as

Mi = d(Ci , (p, q, r, s, u, v)) = ((qvCis + puCir ) mod n).

(a) Suppose Bob’s public key is n = 8557 and his private key is

(p, q, r, s, u, v) = (43, 199, 175, 19, 162, 8).

Compute

C = e(352 56, 8557) and M = d(7, (43, 199, 175, 19, 162, 8)).

(b) Show that the Cocks–Ellis cryptosystem works, i.e., verify formula (8.1).
(c) Prove that the Cocks–Ellis cryptosystem is a special case of RSA.
17. Show that g = 3 is a primitive root modulo 31. Compute DEXP31,3 (20) and
DLOG31,3 (25).
18. Find all of the primitive roots modulo 43.
19. It is known that g = 3 is a primitive root modulo the prime number p =
257. Write an algorithm that computes DLOG257,3 (219). How good is your
algorithm? When generalized to large primes, is the implementation of your
algorithm feasible?
20. Let M, C, e, d, Ke , Kd  denote the ElGamal cryptosystem with public key
(p, g, (g x mod p)) = (29, 8, 3), private key x = 11,
= {0, 1, 2, . . . , 28}, and M = C = {0, 1, 2, . . . , 28}∗ . Verify that 20 =
d(e(20, (29, 8, 3)), 11).
21. Prove that the ElGamal public key cryptosystem works.
22. Suppose that a communication network contains 1000 principals. How many
trips are necessary if each pair of principals wants to establish a shared secret
key?
23. In a communication network, suppose that 1225 trips are necessary for each
pair of principals to establish a shared secret key. How many principals are in
the network?
24. Suppose there are N principals in a communication network. Prove that there
are N (N − 1)/2 trips necessary if each pair of principals wants to establish a
shared secret key.
Chapter 10
Digital Signature Schemes

In this chapter we show how public key cryptosystems can be used to create “digital”
signatures.

10.1 Introduction to Digital Signature Schemes

Consider the following situations that may occur in a communications network:


1. Malice(“Alice”) sends Bob the message:
Withdraw $1000 from my (Alice’s) bank account and
give it to Malice.
or
2. Alice sends to Bob the message:
Transfer $1000 from my (Alice’s) bank account and
give it to Carol.
Malice intercepts the message, replaces Carol with Malice, and then relays
the altered message to Bob.
or
3. Alice sends to Bob the message:
Transfer $1000 from my (Alice’s) bank account and
give it to Malice.
Malice stores this message and resends it to Bob at a later time.
In each case, Bob should verify that the message purportedly from Alice actually
came from Alice, and not Malice. That is, Bob should have some method for
verifying that the received message reflects the intent of Alice. Any such method
is called message authentication.
Historically, message authentication has been established by a handwritten
signature affixed to the message. A digital signature or a digital signature scheme

© Springer Nature Switzerland AG 2022 203


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_10
204 10 Digital Signature Schemes

(DSS) is a method for achieving message authentication through the use of a


cryptosystem (usually public key). A DSS is the “digital analog of the handwritten
signature.”
A digital signature should provide the same guarantees as a traditional signature,
namely it should achieve:
1. Unforgeability: Only Alice should be able to sign her name to a message.
2. Undeniability: Alice should not be able to deny that she signed the message at
some later time.
3. Authentication: The signature should authenticate the message.
In order for Alice’s digital signature to provide unforgeability, it should rely
on some secret known only to her (her private key!). In order for Alice’s digital
signature to provide authentication, it should depend on the message in some way;
there should be a cryptographic link between the signature and the message.
Most public key cryptosystems can be employed as digital signature schemes.
We require the following conditions on the cryptosystem. First, we have the usual
condition that is required of all cryptosystems, recall (8.1): For each message M and
each public key ke , there exists a private key kd for which

d(e(M, ke ), kd ) = M. (10.1)

Second, we require an additional condition: For each “message” M and each private
key kd , there exists a public key ke for which

e(d(M, kd ), ke ) = M. (10.2)

Definition 10.1.1 (Generic Digital Signature Scheme) Let

M, C, e, d, Ke , Kd 

be a public key cryptosystem in which (10.1) and (10.2) hold.


Step 1. Alice establishes her public key ke and trapdoor (private key) kd .
Step 2. Let M be a message that Alice intends to send to Bob. Using her private
key kd , Alice signs the message by computing S = d(M, kd ).
Step 3. Alice sends the message, signature pair (M, S) to Bob.
Step 4. Using Alice’s public key ke , Bob then computes e(S, ke ). If

e(S, ke ) = M = e(d(M, kd ), ke ),

then Bob is assured that message M originated with Alice; he accepts. If

e(S, ke ) = M = e(d(M, kd ), ke ),

then Bob rejects; there is no message authentication.




10.2 The RSA Digital Signature Scheme 205

Authentication is achieved since property (10.2) holds; only those who possess
Alice’s trapdoor can sign M, thereby guaranteeing that the original message M is
recovered with Alice’s public key.

10.2 The RSA Digital Signature Scheme

Can the RSA cryptosystem be used to create a digital signature scheme? We check
that (10.2) holds.
Proposition 10.2.1 Let M, C, e, d, Ke , Kd  be the RSA public key cryptosystem.
For each M ∈ M and kd ∈ Kd , there exists an element ke ∈ Ke so that

e(d(M, kd ), ke ) = M.

Proof Let (s, n) be an RSA public key and let t be the corresponding RSA private
key. Then for each M, 0 ≤ M ≤ n − 1,

e(d(M, t), (s, n)) = e((M t mod n), (s, n))


= ((M t mod n)s mod n)
= (M ts mod n).

There exists an integer a with

st = 1 + a(p − 1)(q − 1),

hence

(M ts mod p) = (MM a(p−1)(q−1) mod p) = (M mod p)

by Fermat’s Little Theorem. Similarly,

(M ts mod q) = (MM a(p−1)(q−1) mod q) = (M mod q).

Thus by the Chinese Remainder Theorem, (10.2) holds. 



Definition 10.2.2 (RSA Digital Signature Scheme)
Step 1. Alice establishes an RSA public key (s, n) and private key t.
Step 2. Let M be a message that Alice intends to send to Bob. Using her private
key t, Alice signs the message by computing S = d(M, t) = (M t mod n).
Step 3. Alice sends the message, signature pair (M, S) to Bob.
Step 4. Using Alice’s public key (s, n), Bob then computes e(S, (s, n)) =
(S s mod n). If e(S, (s, n)) = M, then Bob is assured that message M
206 10 Digital Signature Schemes

originated with Alice; he accepts. If e(S, (s, n)) = M, then Bob rejects;
there is no message authentication.


The RSA digital signature scheme is effective because e(x, (s, n)) has the
qualities of a one-way trapdoor function: It is essentially impossible for Malice to
find x so that

M = e(x, (s, n)) = (x s mod n)

without knowledge of Alice’s trapdoor t.


Example 10.2.3 Suppose Alice’s RSA public key is chosen to be (s, n) =
(1003, 85651) and her private key is computed to be t = 13507. (Note that
85651 = pq = (97)(883).)
She signs the message M = M1 M2 = 2 100 by computing S = S1 S2 with

S1 = d(2, 13507) = 213507 mod 85651 = 37736.

S2 = d(100, 13507) = 10013507 mod 85651 = 58586.

Alice then sends the message, signature pair

(M, S) = (2 100, 37736 58586)

to Bob. Bob can verify that M came from Alice by computing

e(S, (1003, 85651)) = M,

thus:

M1 = e(S1 , (s, n))


= e(37736, (1003, 85651))
= 377361003 mod 85651
= 2,

M2 = e(S2 , (s, n))


= e(58586, (1003, 85651))
= 585861003 mod 85651
= 100.


10.3 Signature with Privacy 207

Example 10.2.4 Assume that Alice’s public key is (1003, 85651) and her private
key is 13507 as in Example 10.2.3. Suppose that Bob receives the message,
signature pair

(M, S) = (345, 7219)

presumably from Alice.


Should Bob conclude that Alice wrote the message? Bob computes

e(7219, (1003, 85651)) = 72191003 mod 85651 = 83169.

Since 83169 = 345 Bob rejects!





10.3 Signature with Privacy

In the RSA DSS, Alice sends the message, signature pair (M, S) with M in
plaintext. Thus anyone intercepting the transmission can read Alice’s message. If
Alice wants to sign an encrypted message, she should use a “signature with privacy
scheme.”
Definition 10.3.1 (RSA Signature with Privacy)
Step 1. Alice establishes a public key and private key pair (sA , nA ), tA . Bob
establishes a public key and private key pair (sB , nB ), tB .
Step 2. Alice signs the message M as

S = d(M, tA ) = (M tA mod nA ).

Step 3. Alice then encrypts the message M using Bob’s public key:

C1 = e(M, (sB , nB )) = (M sB mod nB ).

She also encrypts the signature S using Bob’s public key:

C2 = e(S, (sB , nB )) = (S sB mod nB ).

She then sends the pair (C1 , C2 ) to Bob.


Step 4. Bob uses his private key tB to recover

M = d(C1 , tB ) = (C1tB mod nB ).


208 10 Digital Signature Schemes

He also recovers

S = d(C2 , tB ) = (C2tB mod nB ).

Step 5. Finally, using Alice’s public key (sA , nA ) he authenticates the message by
verifying that

e(S, (sA , nA )) = e(d(M, tA ), (sA , nA )) = M.





10.4 Security of Digital Signature Schemes

Suppose Alice and Bob are using a digital signature scheme for message authen-
tication. Malice can attack (or break) the digital signature scheme by producing
forgeries. Specifically, a forgery is a message, signature pair (M, S) for which S is
Alice’s signature of the message M.
Essentially, there are two types of forgeries. An existential forgery is a forgery
of the form (M, S) for some M ∈ M. A selective forgery is a forgery of the form
(M, S) in which M is chosen by Malice.
Malice will produce forgeries by engaging in several types of attacks. A direct
attack is an attack in which Malice only knows Alice’s public key. A known-
signature attack is an attack in which Malice knows Alice’s public key together
with a set of message, signature pairs

(M1 , S1 ), (M2 , S2 ), . . . , (Mr , Sr )

signed by Alice. A chosen-message attack is an attack in which Malice knows


Alice’s public key and has (somehow) convinced her to sign a set of messages
M1 , M2 , . . . , Mr that Malice has chosen.
How secure is the RSA Digital Signature Scheme?
Proposition 10.4.1 The RSA DSS is existentially forgeable under a direct attack.
Proof We show that knowing only Alice’s public key, Malice can forge at least one
message M. Suppose that Alice’s RSA public key is (s, n) and her private key is t.
Malice chooses any integer N, 0 ≤ N ≤ n − 1, and computes

M = e(N, (s, n)) = (N s mod n).

Malice then sends the message, signature pair (M, N ) to Bob. Bob will conclude
(incorrectly) that Alice signed the message since

e(N, (s, n)) = (N s mod n) = M.



10.5 Hash Functions and DSS 209

Example 10.4.2 Suppose Alice’s RSA public key is (s, n) = (1003, 85651) and her
private key is t = 13507. Let N = 600 and compute

M = (6001003 mod 85651) = 34811.

Then the message, signature pair (M, N ) = (34811, 600) is correctly signed by
Alice: We have

(3481113507 mod 85651) = 600.




Proposition 10.4.3 The RSA DSS is selectively forgeable under a chosen-message
attack.
Proof Let M, 0 ≤ M ≤ n − 1 be a message chosen by Malice, and let R be an
integer, 1 ≤ R ≤ n − 1, with gcd(R, n) = 1. Suppose that Malice gets Alice to sign
the messages M · R and R −1 , yielding the message, signature pairs (M · R, T ) and
(R −1 , U ). Malice computes V = ((T · U ) mod n) and sends the message, signature
pair (M, V ) to Bob. Upon receipt, Bob accepts the message M since it has a valid
signature. Indeed,

d(M, t) = (M t mod n)
= ((M · (R · R −1 ))t mod n)
= (((M · R) · R −1 )t mod n)
= ((M · R)t (R −1 )t mod n)
= ((T · U ) mod n)
= V.

Bob concludes (incorrectly) that Alice signed the message M. 




10.5 Hash Functions and DSS

In this section we introduce cryptographic hash functions as a way to improve the


efficiency and security of the RSA DSS.
Let M, C, e, d, Ke , Kd  be a public key cryptosystem to be used as a DSS.
Definition 10.5.1 A hash function is a function

h:M→M

in which the image h(M), called the digest of message M, is significantly smaller
than M.
210 10 Digital Signature Schemes

Definition 10.5.2 A hash function h : M → M is regular if each digest y ∈


h(M) = {h(M) : M ∈ M} has the same number of preimages in M.
Definition 10.5.3 A hash function h : M → M is cryptographic if h (ideally)
satisfies the following conditions:
(i) h is a “one-way” function, i.e., given M ∈ M, the digest h(M) is easy to
compute, and given h(M), it is hard to find a preimage M  for which h(M  ) =
h(M),
(ii) h destroys any algebraic relationship between messages M1 , M2 and their
digests h(M1 ), h(M2 ), i.e., h is “non-homomorphic.”
(iii) h is “collision resistant,” i.e., it is infeasible to find M1 , M2 ∈ M with
h(M1 ) = h(M2 ).
In what follows, we give examples of hash functions and show that they are
cryptographic.

10.5.1 The Discrete Log Family

A discrete log hash function is a hash function of the form

h : Zn → U (Zp ),

where n = pq is an RSA modulus.


The hash function is constructed as follows. Let (s, n) be Alice’s RSA public key
and let t be her private key with n = pq for primes p, q, with p a safe prime, i.e.,
p = 2r + 1 where r is prime. Suppose that r 2 > n = pq. Let g1 , g2 be primitive
roots modulo p and let

f : Zr × Zr → U (Zp )

be the function given as


y
f (x, y) = (g1x g2 mod p)

for x, y ∈ Zr [10], [59, p. 180].


Then we can produce a digest of any message M, 0 ≤ M ≤ n − 1 as follows:
List the elements of Zr × Zr in genealogical order

(0, 0), (0, 1), (0, 2), . . . , (0, r − 1), (1, 0), . . . , (r − 1, r − 1). (10.3)

Every M in Zr has the form M = ri + j where 0 ≤ j < r and i < r (since n < r 2 ).
So we have an embedding

λ : Zn → Zr × Zr
10.5 Hash Functions and DSS 211

by λ(M) = λ(ri + j ) = (i, j ), the (M + 1)st element of the genealogical list (10.3).
Then the hash function is the composite function

h = (f ◦ λ) : Zn → U (Zp ),

j
where h(M) = h(ri + j ) = (g1i g2 mod p).
Example 10.5.4 Choose primes p = 47, q = 11 and RSA modulus n = 47 · 11 =
517. Let (s, n) = (149, 517) be Alice’s public key and let t = 389 be her private
key. Choose g1 = 5 and g2 = 31, which are primitive roots modulo 47. Note that
47 = 2 · 23 + 1 is a safe prime with r = 23, 232 > 517.
Thus there is an embedding Z517 → Z23 ×Z23 and a corresponding hash function

h : Z517 → U (Z47 ).

For instance, to compute the digest h(100), we first embed 100 ∈ Z517 into Z23 ×Z23
to obtain (4, 8) ∈ Z23 × Z23 . Thus

h(100) = f (4, 8) = ((54 · 318 ) mod 47) = 24.

Likewise, we compute

h(258) = f (11, 5) = ((511 · 315 ) mod 47) = 16.




Proposition 10.5.5 Let h : Zn → U (Zp ) be a discrete log hash function. Assume
that h is regular and assume that p is at least a 160-bit prime. Then h is a
cryptographic hash function.
Proof The number of possible digests is |U (Zp )| = p − 1, which is significantly
smaller that n = pq. So it remains to show that the conditions of Definition 10.5.3
are satisfied.
For (i): Assuming that the Discrete Logarithm Assumption holds, h is a one-way
function.
For (ii): The discrete log function h is not a multiplicative homomorphism. For
instance,

h(100 · 258) = h(44) = f (1, 21) = ((51 · 3121 ) mod 47) = 2,

while

h(100) · h(258) = ((24 · 16) mod 47) = 8.

Finally, we check that h is collision resistant. Because h is regular, if we choose m


messages at random from M, then we are essentially choosing m digests at random
from h(M).
212 10 Digital Signature Schemes

By the first Collision Theorem (Proposition 2.3.1), in order to ensure the


likelihood of at least one collision
√ among the randomly selected digests, we need to
choose at least m = 1 +  1.4p messages in M, and then compute their digests

in h(M). But since p is at least a 160-bit prime, m is at least p ≈ 280 .
So, in order to make a collision feasible (i.e., Pr > 1/2), our computer must
perform at least 280 operations (message selections), which is beyond the number
of operations that a computer can perform in a reasonable amount of time. Thus h
is collision resistant.



10.5.2 The MD-4 Family

The MD-4 family is a large collection of hash functions of the form

h : {0, 1}m → {0, 1}t ,

m > t, computed using multiple rounds and steps. These include MD-4, MD-5, and
SHA-1.
For instance, MD-4 is of the form

h : {0, 1}512 → {0, 1}128 ,

and contains 3 rounds of 16 steps each; SHA-1 consists of 4 rounds of 20 steps and
has the form

h : {0, 1}512 → {0, 1}160 .

We illustrate how the MD-4 family works by giving an example of a simplified


single round MD-4-type hash function.
Our MD-4 hash function

h : {0, 1}16 → {0, 1}8

involves parameters (s1 , s2 , f, σ ) as follows: s1 , s2 are two fixed half-bytes (i.e.,


elements of {0, 1}4 ), the round function is f (x, y) = x ⊕ y, and σ is a fixed
permutation in S4 ,
 
0123
σ = .
1230
10.5 Hash Functions and DSS 213

Let M ∈ {0, 1}16 be a message. The digest h(M) is computed using the
algorithm:
Algorithm 10.5.6 (MD-4-Round)
Input: a message M ∈ {0, 1}16 , divided into 4 half-bytes,
M0 , M1 , M2 , M3 .
Output: the digest h(M) ∈ {0, 1}8 .
Algorithm:
(a, b) ← (s1 , s2 )
for i = 0 to 3 do
t ← f (a, b) ⊕ Mi
(a, b) ← (b, σ (t))
next i
output ab
Example 10.5.7 Let s1 = 1111, s2 = 0000. Let

h : {0, 1}16 → {0, 1}8

be the single round MD-4 hash function. We compute h(M) where

M = 1001101111000010.

In this case, M0 = 1001, M1 = 1011, M2 = 1100, M3 = 0010.


For i = 0,

t = f (1111, 0000) ⊕ 1001 = 0110

(a, b) = (0000, σ (0110)) = (0000, 1100).

For i = 1,

t = f (0000, 1100) ⊕ 1011 = 0111

(a, b) = (1100, σ (0111)) = (1100, 1110).

For i = 2,

t = f (1100, 1110) ⊕ 1100 = 1110

(a, b) = (1110, σ (1110)) = (1110, 1101).

For i = 3,

t = f (1110, 1101) ⊕ 0010 = 0001

(a, b) = (1101, σ (0001)) = (1101, 0010).


214 10 Digital Signature Schemes

Thus

h(1001101111000010) = 11010010.


Proposition 10.5.8 Let h : {0, 1}m
→ {0, 1}t ,
m & t, be an MD-4 hash function
with t ≥ 160. Assume that h is regular. Then h is a cryptographic hash function.
Proof The number of possible digests is 2t , which is significantly smaller than the
number of messages, which is 2m .
We show that the conditions of Definition 10.5.3 are satisfied.
The important condition is collision resistance. Because h is regular, if we choose
n messages at random from M = {0, 1}m , then we are essentially choosing n digests
at random from h({0, 1}m ).
By the first Collision √Theorem (Proposition 2.3.1), a collision is likely if we
choose√ at least n = 1 +  1.4 · 2t  messages in {0, 1}m . But since t ≥ 160, n is at
least 2 = 2 .
160 80

So, in order to make a collision feasible (i.e., Pr > 1/2), our computer must
perform at least 280 operations (message selections), which is beyond the number
of operations that a computer can perform in a reasonable amount of time. Thus h
is collision resistant.



10.5.3 Hash-Then-Sign DSS

Hash functions improve the efficiency and security of RSA DSS, provided that
the hash function is cryptographic, i.e., has some (or all) of the properties of
Definition 10.5.3.
In a “hash-then-sign DSS” the digest of a message is signed instead of the
message.
Definition 10.5.9 (Hash-Then-Sign RSA DSS)
Step 1. Alice and Bob agree to use a discrete log hash function h : Zn → U (Zp ),
where n = pq is the RSA modulus.
Step 2. Alice establishes a public key and private key pair (sA , nA ), tA .
Step 3. Alice signs the message M ∈ Zn by first computing its digest h(M) and
then signing the digest as S = d(h(M), tA ) = (h(M)tA mod nA ). She then
sends to Bob the pair (M, S) to Bob.
Step 4. Bob computes h(M) and using Alice’s public key, he authenticates the
message by verifying that

e(S, (sA , nA )) = e(d(h(M), tA ), (sA , nA )) = h(M).




10.6 Exercises 215

The hash-then-sign RSA DSS described above is more efficient than RSA DSS
since Alice need only sign a much shorter digest h(M).
Moreover, the collision resistance of h ensures that the DSS is more secure than
ordinary RSA DSS. Suppose that S is Alice’s signature of the digest h(M). Since
it is infeasible for Malice to obtain a collision h(N) = h(M), it is not possible for
Malice to obtain a forgery (S, N ), where N is a message that is signed by Alice.
Finally, the non-homomorphism property of h makes it unlikely that the chosen-
message attack of Proposition 10.4.3 will succeed in producing a forgery.

10.6 Exercises

1. Suppose that Alice is using the RSA Digital Signature Scheme with public key
(s, n) = (631, 7991) and private key t = 1471 to sign messages.
(a) Bob receives the message, signature pair

(M, S) = (7 4 11 11 14, 7374 2810 175 175 612)

presumably from Alice. Should Bob accept the message?


(b) Acting as Malice, produce a message, signature pair (M, S) that is a forgery
of Alice’s signature.
2. Most digital signature schemes are obtained from public key cryptosystems. Is it
possible to use a symmetric key cryptosystem to create a digital signature scheme
(in an analogous manner)?
3. Consider the Hash-Then-sign RSA DSS with hash function

h : Z517 → U (Z47 )

of Example 10.5.4.
(a) Compute the digest h(312).
(b) Suppose that Bob receives the message, signature pair (312, 175) presum-
ably from Alice. Should he accept?
4. Given the discrete log hash function h : Zn → U (Zp ), p, q prime, n = pq,
find the minimum number of values h(M) required so that the probability of a
collision is > 34 .
5. Let h : {0, 1}16 → {0, 1}8 be the MD-4-type hash function of Example 10.5.7.
· · · 1.)
(a) Compute h(116 ). (Here 116 denotes 111
16
(b) Suppose that 10 values h(M) are computed. What is the probability that there
is at least one collision?
216 10 Digital Signature Schemes

6. Show that the hash function of Example 10.5.7 is not a homomorphism with
respect to bit-wise addition modulo 2, i.e., show that

h(M ⊕ N) = h(M) ⊕ h(N)

for some M, N ∈ {0, 1}16 .


7. Suppose that message M is a 25 letter sequence of English words. Thus, M can
be encoded over the alphabet {0, 1} as a string of 25 · log2 (26) = 118 bits.
Let

h : {0, 1}118 → {0, 1}59

be a hash function that is regular, i.e., for each y ∈ {0, 1}59 , there are exactly
{0, 1}59 strings x ∈ {0, 1}118 for which h(x) = y.
(a) Assuming h regular, estimate the number of strings N ∈ {0, 1}118 for which
N is the encoding of a meaningful 25 letter sequence of English words.
(b) Estimate the number of strings N satisfying part (a), and for which h(N) =
h(M).
Hint: From Section 3.4, we have

lim Hn /n = 1.5,
n→∞

thus, we can assume that H25 /25 = 1.5, hence H25 = 37.5. Use this fact
to estimate the probability that a string in {0, 1}118 is the encoding of a
meaningful sequence in English.
Chapter 11
Key Generation

The Vernam cipher of Section 8.5 is the only cryptosystem that has perfect secrecy
(Definition 8.5.8), because the key is a random sequence of bits of the same length
as the message (Proposition 8.5.9).
The problem studied in this chapter is to describe generators for “pseudorandom”
sequences of bits and try to determine how close they come to being truly random
sequences.
The pseudorandom sequences are produced by bit generators of the form

G : {0, 1}l → {0, 1}m ,

m & l, where the random seed k ∈ {0, 1}l is used to generate longer bit strings of
length m.
We can then use a pseudorandom sequence as a key stream in a stream cipher;
the stream cipher together with the pseudorandom key stream then exhibits “almost
perfect secrecy,” approximating the perfect secrecy of the Vernam cipher.

11.1 Linearly Recursive Sequences

In this section we show how a sequence of bits can be generated as a linearly


recursive sequence. Under certain conditions, these sequences are periodic with
period s. The first s bits of the sequence can be viewed as a key of length s in
the Vigenère cipher over the alphabet = {0, 1}.
We begin with a review of the general theory of linearly recursive sequences over
a field.

© Springer Nature Switzerland AG 2022 217


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_11
218 11 Key Generation

Definition 11.1.1 Let K be a field, and let l > 0 be a positive integer. An lth-order
linearly recursive sequence in K is a sequence {sn }n≥0 for which

sn+l = al−1 sn+l−1 + al−2 sn+l−2 + · · · + a0 sn + a (11.1)

for some elements a, a0 , a1 , a2 , . . . , al−1 ∈ K and all n ≥ 0. The relation (11.1) is


the recurrence relation of the sequence.
For n ≥ 0, the vector sn = (sn , sn+1 , sn+2 , . . . , sn+l−1 ) is the nth state vector
of {sn }; s0 = (s0 , s1 , s2 , . . . , sl−1 ) is the initial state vector. A linearly recursive
sequence is completely determined by specifying the recurrence relation (11.1) and
initial state vector.
Example 11.1.2 Let K = Q, and let l = 3. Then the recurrence relation

sn+3 = 2sn+1 + sn , n ≥ 0,

and initial state vector s0 = (2, 0, 1) define the 3rd-order linearly recursive sequence

{sn } = 2, 0, 1, 2, 2, 5, 6, 12, . . .

The linearly recursive sequence {sn } is homogeneous if a = 0. The sequence


{sn } is eventually periodic if there exist integers N ≥ 0, t > 0 for which sn+t = sn ,
for all n ≥ N. The sequence {sn } is periodic if {sn } is eventually periodic with
N = 0; that is, {sn } is periodic if there exists an integer t > 0 so that sn+t = sn for
all n ≥ 0.
Suppose {sn } is eventually periodic. Then the smallest positive integer r for
which sn+r = sn for all n ≥ N is the period of {sn }.
We assume that all linearly recursive sequences are homogeneous.
Homogeneous linearly recursive sequences can be described in terms of matrices.
Let {sn } be an lth-order linearly recursive sequence with recurrence relation (11.1)
(a = 0). Let A be the l × l matrix defined as
⎛ ⎞
0 1 0 ··· 0 0
⎜0 0 1 ··· 0 0 ⎟
⎜ ⎟
⎜. . . . .. ⎟
⎜ . . . . . .. ⎟
A=⎜ ⎜
. . . . . ⎟.

⎜0 0 0 ··· 1 0 ⎟
⎜ ⎟
⎝0 0 0 ··· 0 1 ⎠
a0 a1 a2 · · · al−2 al−1

Let M T denote the transpose of a matrix M.


Proposition 11.1.3 With A defined as above,

sTn = An sT0

for all n ≥ 0.
11.1 Linearly Recursive Sequences 219

Proof Use induction on n. The trivial case is n = 0: sT0 = Il sT0 , where Il denotes
the l × l identity matrix. For the induction hypothesis, assume that sTn−1 = An−1 sT0 .
Then AsTn−1 = AAn−1 sT0 , hence sTn = An sT0 .


The matrix A is the matrix of the homogeneous linearly recursive sequence. Let
{sn } be an lth-order linearly recursive sequence with matrix A. The characteristic
polynomial of {sn } is the characteristic polynomial of A in the usual sense; that is,
the characteristic polynomial of {sn } is

f (x) = det(xIl − A).

Proposition 11.1.4 Let {sn } be an lth-order linearly recursive sequence defined


by (11.1). Let f (x) be the characteristic polynomial of {sn }. Then

f (x) = x l − al−1 x l−1 − al−2 x l−2 − · · · − a0 ∈ K[x].

Proof We proceed by induction on the order l of the linearly recursive sequence.


For a detailed proof, see [61, Proposition 5.13].


Let K be a field, let

g(x) = bt x t + bt−1 x t−1 + · · · + b1 x + b0

be a polynomial in K[x], and let A be a matrix in Matl (K). Then by the evaluation
g(A) we mean the linear combination of matrices

g(A) = bt At + bt−1 At−1 + · · · + b1 A + b0 Il .

The polynomial g(x) annihilates A if g(A) = 0, where 0 denotes the l × l zero


matrix. The set of all polynomials in K[x] that annihilate A is a non-zero ideal
J of K[x] (Section 11.5, Exercise 6). By Proposition 7.1.1, J is a principal ideal,
i.e., J = (m(x)), for some monic polynomial m(x). The polynomial m(x) is the
minimal polynomial of A.
Proposition 11.1.5 Let A be a matrix in Matl (K) of the form
⎛ ⎞
0 1 0 ··· 0 0
⎜0 1 ··· ⎟
⎜ 0 0 0 ⎟
⎜. .. .. . . .. ..⎟
⎜. ⎟
A=⎜

. . . . . .⎟.

⎜0 0 0 ··· 1 0 ⎟
⎜ ⎟
⎝0 0 0 ··· 0 1 ⎠
a0 a1 a2 · · · al−2 al−1
220 11 Key Generation

Then the characteristic polynomial of A is the minimal polynomial of A.


Proof The proof of this well-known result uses the Cayley–Hamilton theorem; see
[24, Section 7.1, Corollary].


An application of Proposition 11.1.5 says that the characteristic polynomial of a
linearly recursive sequence is the minimal polynomial of its matrix.
For the remainder of this section, we are going to consider lth-order linearly
recursive sequences {sn } over the field K = Fpm , the Galois field of pm elements.
Recall that a polynomial f (x) ∈ Fpm of degree l is primitive if f (x) is
irreducible and there exists a root α of f (x) that generates the cyclic group of units
U (Fpm (α)) = Fpm (α)× .
We will prove the fundamental result on lth-order linearly recursive sequences in
Fpm : if the characteristic polynomial of the sequence is primitive, then the sequence
is periodic with maximal period r = pml − 1. We begin by showing that every
linearly recursive sequence in a finite field is periodic.
Proposition 11.1.6 Let {sn } be an lth-order linearly recursive sequence in Fpm .
Then {sn } is eventually periodic with period 1 ≤ r ≤ pml − 1.
Proof If si = (0, 0, . . . , 0) for any i ≥ 0, then {sn } is eventually periodic with
  
l
N = i and r = 1 ≤ pml − 1. So we assume that sn is not the zero vector for all
n ≥ 0.
Observe that there are precisely pml −1 distinct l-tuples of elements in Fpm other
than the zero vector. Thus there exist integers i, j , 0 ≤ i < j ≤ pml − 1, so that
sj = si . We claim that sn+j −i = sn for all n ≥ i. To prove this claim, we proceed by
induction on n ≥ i, with the trivial case n = i already established. For the induction
hypothesis, we assume that sn+j −i = sn holds for n = i + ω, where ω ≥ 0 is a fixed
integer. Thus sω+j = si+ω , and hence

(sω+j , sω+j +1 , . . . , sω+j +l−1 ) = (si+ω , si+ω+1 , . . . , si+ω+l−1 ). (11.2)

Consequently

sω+1+j +η = si+ω+1+η ,

for η = 0, 1, 2, . . . , l − 2. Moreover, from (11.2) and the recurrence relation (11.1)


one obtains

sω+1+j +η = si+ω+1+η ,

for η = l − 1. Thus

(s (ω+1)+j , s(ω+1)+j +1 , . . . , s(ω+1)+j +l−1 )


= (si+(ω+1) , si+(ω+1)+1 , . . . , si+(ω+1)+l−1 ),
11.1 Linearly Recursive Sequences 221

which yields sn+j −i = sn for n = i + (ω + 1). The proof by induction is complete


and hence sn+(j −i) = sn for all n ≥ i. Thus {sn } is eventually periodic with N = i
and period r satisfying 1 ≤ r ≤ j − i ≤ pml − 1.


Proposition 11.1.7 Let {sn } be an lth-order linearly recursive sequence in Fpm with
recurrence relation (11.1). If a0 = 0, then {sn } is periodic with period 1 ≤ r ≤
pml − 1.
Proof By Proposition 11.1.6, {sn } is eventually periodic with period 1 ≤ r ≤ pml −
1. Suppose N0 is the smallest integer for which sn+r = sn for all n ≥ N0 . If N0 = 0,
then {sn } is periodic. So we assume that N0 ≥ 1. But now from (11.1),

sN0 −1+r = a0−1 s(N0 −1+r)+l − al−1 s(N0 −1+r)+l−1 − · · · − a1 s(N0 −1+r)+1

= a0−1 sN0 +l−1+r − al−1 sN0 +l−2+r − · · · − a1 sN0 +r

= a0−1 sN0 +l−1 − al−1 sN0 +l−2 − · · · − a1 sN0

= a0−1 s(N0 −1)+l − al−1 s(N0 −1)+l−1 − · · · − a1 s(N0 −1)+1
= sN0 −1 .

Thus sn+r = sn for all n ≥ N0 − 1, which contradicts the minimality of N0 .




Let {sn } be an lth-order linearly recursive sequence with recurrence rela-
tion (11.1). In what follows we assume that a0 = 0 so that {sn } is periodic with
period r, 1 ≤ r ≤ pml − 1. Let
⎛ ⎞
0 1 0··· 0 0
⎜0 ··· 0 ⎟
⎜ 0 1 0 ⎟
⎜. .. ..
. . .. ..⎟
⎜. ⎟
A=⎜

. . . . . .⎟

⎜0 0 0 ··· 1 0 ⎟
⎜ ⎟
⎝0 0 0 ··· 0 1 ⎠
a0 a1 a2 · · · al−2 al−1

be the matrix of {sn }. Observe that det(A) = (−1)l+1 a0 and, consequently, A is


an invertible element of GLl (Fpm ), a finite group. Hence A has finite order in
GLl (Fpm ).
Proposition 11.1.8 Let {sn } be an lth-order linearly recursive sequence in Fpm .
Then the period r of {sn } divides the order of A in GLl (Fpm ).
Proof Let w be the order of A. Then by Proposition 11.1.3

sTn+w = An+w sT0 = An sT0 = sTn ,


222 11 Key Generation

for all n ≥ 0. Thus r ≤ w. There exist integers t, u so that w = rt + u with


0 ≤ u < r, t > 0. Now, for all n ≥ 0,

sTn = sTn+w = sTn+rt+u = sTn+r+r(t−1)+u = sTn+r(t−1)+u ,

sTn+r(t−1)+u = sTn+r+r(t−2)+u = sTn+r(t−2)+u ,

sTn+r(t−2)+u = sTn+r+r(t−3)+u = sTn+r(t−3)+u ,

..
.

sTn+r+u = sTn+u .

Consequently, sTn = sTn+u . Since r is the period of {sn }, u = 0, and so r | w.




Proposition 11.1.9 Let {sn } be an lth-order linearly recursive sequence in Fpm . Let
A be the matrix of {sn }, and let f (x) be the characteristic polynomial of A. Assume
that a0 = 0. Then order(f (x)) equals the order of the matrix A in the finite group
GLl (Fpm ).
Proof Recall that order(f (x)) is the smallest positive integer e for which f (x) |
x e − 1, cf. Proposition 7.3.8.
The order of A is the smallest positive integer w so that Aw − Il = 0. That
is, w is the smallest positive integer so that x w − 1 is in the annihilator ideal of
A. Consequently, the minimal polynomial m(x) | x w − 1. By Proposition 11.1.5,
f (x) = m(x), and so f (x) | x w − 1. Thus order(f (x)) ≤ w. If order(f (x)) < w,
then there exists an integer q, 0 < q < w, with f (x) | x q − 1. Consequently,
Aq − Il = 0, which contradicts our assumption that w is the order of A. It follows
that order(f (x)) = w.


Proposition 11.1.10 Let {sn } be an lth-order linearly recursive sequence in Fpm .
Let A be the matrix of {sn }, and let f (x) be the characteristic polynomial of A.
Assume that a0 = 0. Let r be the period of {sn }. Then r | order(f (x)).
Proof Let w be the order of A in GLl (Fpm ). By Proposition 11.1.8, r | w. By
Proposition 11.1.9, w = order(f (x)). Thus r | order(f (x)).


Proposition 11.1.11 Let {sn } be an lth-order linearly recursive sequence in Fpm .
Let A be the matrix of {sn }, and let f (x) be the characteristic polynomial of A.
Assume that a0 = 0. Let r be the period of {sn }. If f (x) is irreducible over Fpm ,
then r = order(f (x)).
11.1 Linearly Recursive Sequences 223

Proof By Proposition 11.1.10, r | order(f (x)). Thus r ≤ order(f (x)). By [61,


Proposition 5.21],

f (x)v(x) = (1 − x r )w(x)

for some v(x), w(x) ∈ Fpm [x], w(x) = 0, and deg(w(x)) ≤ k − 1. Thus f (x) |
(1 − x r )w(x). Since f (x) is irreducible, either f (x) | 1 − x r or f (x) | w(x). Since
deg(w(x)) < deg(f (x)), one has f (x) | 1 − x r and so, order(f (x)) ≤ r.


Here is the key result regarding linearly recursive sequences.
Theorem 11.1.12 Let {sn } be an lth-order linearly recursive sequence in Fpm with
characteristic polynomial f (x). Assume that a0 = 0 and let r be the period of {sn }.
If f (x) is primitive over Fpm , then r = pml − 1.
Proof By Proposition 11.1.11, r = order(f (x)). By Proposition 7.3.12,
order(f (x)) is the order of any root α of f (x) in Fpm (α)× . Since f (x) is primitive
of degree l, the order of α is pml − 1.


Example 11.1.13 By Example 7.3.16, f (x) = x 4 + x + 1 is a primitive polynomial
over F2 . Thus we can apply Theorem 11.1.12 to produce a 4th-order linearly
recursive sequence {sn } in F2 that has maximal period r = 24 − 1 = 15. From
f (x) = x 4 + x + 1, we obtain the recurrence relation

sn+4 = sn+1 + sn , n ≥ 0.

Choosing the initial state vector s0 = 0110, we obtain the sequence

011010111100010011010111100010011 . . .

of maximal period 15. As another example, the initial state vector s0 = 0001 yields
the sequence

000100110101111000100110101111000 . . .

of period 15. We have the bit generator

G : {0, 1}4 → {0, 1}m ,

m ≥ 4, defined as

G(s0 ) = s0 s1 s2 s3 . . .
224 11 Key Generation

For instance,

G(0110) = 011010111100010011010111100010011 . . .


Let M, C, e, d, K denote the Vigenère cipher over the alphabet = {0, 1}. We
have M = C = {0, 1}r and K = {0, 1}s , where r, s ≥ 1 are integers. The integer r
is the message length, and s is the key length. We take s = 15.
Let

k = 011010111100010

be the first 15 terms of the linearly recursive sequence G(0110) of Example 11.1.13.
Then k is a key for the Vigenère cipher.
The message M = 10110011000101001110 is encrypted by vertical addition
modulo 2:

1 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0
0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1
1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1

Theorem 11.1.12 suggests that we can construct a bit generator with a very large
period by finding a primitive polynomial over F2

f (x) = x l − al−1 x l−1 − al−2 x l−2 − · · · − a0

of large degree l and setting

sn+l = al−1 sn+l−1 + al−2 sn+l−2 + · · · + a0 sn

for n ≥ 0. Let s0 = (s0 , s1 , . . . , sl−1 ), si ∈ F2 , be an initial state vector. Then the


recurrence relation defines a bit stream {sn } with period 2l − 1.
Example 11.1.14 It is known that

f (x) = x 20 + x 19 + x 4 + x 3 + 1

is a primitive polynomial in F2 [x]. The polynomial f defines a 20th-order linearly


recursive sequence {sn } with recurrence relation

sn+20 = sn+19 + sn+4 + sn+3 + sn

for n ≥ 0. Since f is primitive, {sn } has period

220 − 1 = 1, 048, 575.


11.1 Linearly Recursive Sequences 225

We let

s0 = 01100010100010001110

be the randomly chosen initial state vector.


The first 500 terms of {sn } are computed using the simple GAP program:
s:=List([1..500]);
s[1]:=0; s[2]:=1; s[3]:=1; s[4]:=0;
s[5]:=0; s[6]:=0; s[7]:=1 s[8]:=0;
s[9]:=1; s[10]:=0; s[11]:=0; s[12]:=0;
s[13]:=1; s[14]:=0; s[15]:=0; s[16]:=0;
s[17]:=1; s[18]:=1; s[19]:=1; s[20]:=0; #initial state vector
for j in [1..500] do;
s[j+20]:=(s[j+19]+s[j+4]+s[j+3]+s[j]) mod 2;
Print(s[j]);
od;

The sequence is
011000101000100011100110101110000001001011110101000100110011
111101110010111000100010100011010110000101000010000001010101
101000111010110000110000011111110100110110000101111010101110
101001111110001000010100010010011101010111001110110000111010
011001110100011101011100111000001111101001101011000011110101
000000101111111101100010110010100011100100000010111110111110
001011001110110010010000011001111110000110011100010010100111
0010110011010100100011...

The bit generator is

G : {0, 1}20 → {0, 1}m ,

m ≥ 20.


Despite having large periods, bit streams as in Example 11.1.14 cannot be
considered cryptographically secure.
Proposition 11.1.15 Let {sn } be an lth-order linearly recursive sequence defined
by the degree l primitive polynomial f (x) over F2 . Suppose Malice has obtained 2l
consecutive terms of {sn }. Then Malice can compute all of the remaining terms of
{sn } in O(l 3 ) steps.
226 11 Key Generation

Proof We can assume without loss of generality that Malice has obtained the first
2l terms of {sn }. Consequently, there is a system of equations


⎪ a0 s0 + a1 s1 + a2 s2 + · · · + al−1 sl−1 = sl


⎨ 0 s1 + a1 s2 + a2 s3 + · · · + al−1 sl
⎪ a = sl+1
a0 s2 + a1 s3 + a2 s4 + · · · + al−1 sl+1 = sl+2

⎪ .. .. ..




. . .
a0 sl−1 + a1 sl + a2 sl+1 + · · · + al−1 s2l−2 = s2l−1

in the variables a0 , a1 , . . . , al−1 . This system has a unique solution, which can be
obtained using Gaussian elimination in O(l 3 ) bit operations.


Thus, the computation of a primitive polynomial of degree 20, and hence a
sequence of period 220 − 1, would require 40 consecutive terms of the sequence and
≈ 203 = 8000 bit operations, easily within the range of a modern digital computer.
Malice’s attack on the key stream would succeed.

11.2 The Shrinking Generator Sequence

In order to use linearly recursive sequences to generate key streams for stream
ciphers with a reasonable level of security, one must combine several sequences
together in a way that eliminates the linearity that leads to Malice’s successful attack
given in Proposition 11.1.15.
One way to do this is to use two linearly recursive sequences in the following
way. Let {sn }, {tn } be linearly recursive sequences in F2 of order k, l, respectively.
In the sequence {sn }, suppose that the 1s occur in the terms

sn 0 , sn 1 , sn 2 , sn 3 , . . .

Define a new sequence {vm } by the rule:

vm = tnm ,

for m ≥ 0. The sequence {vm }, m ≥ 0, is the shrinking generator sequence. The


sequence s is the selector sequence.
The shrinking generator sequence was introduced by D. Coppersmith, H.
Krawczyk, and Y. Mansour in the paper [18].
Example 11.2.1 (Shrinking Generator Sequence) Let {sn } be the 4th-order linearly
recursive sequence in F2 with recurrence relation sn+4 = sn+1 + sn and initial state
s0 = 0100. The sequence {sn } has period 15 and begins

010011010111100 . . .
11.2 The Shrinking Generator Sequence 227

Let {tn } be the 5th-order linearly recursive sequence in F2 with recurrence relation
tn+5 = tn+2 +tn and initial state t0 = 10110. The sequence has period 31 and begins

1011001111100011011101010000100 . . .

The shrinking generator sequence {vm }, m ≥ 0, is

000111000100000 . . .


The shrinking generator sequence is periodic, though its period is significantly
larger than those of the linearly recursive sequences used to compute it.
To compute the period of the shrinking generator sequence, we begin with some
lemmas.
Lemma 11.2.2 Let {sn } be a kth-order linearly recursive sequence in F2 with
primitive characteristic polynomial f (x). Then there are 2k−1 occurrences of 1 in
the first 2k − 1 terms of {sn }.
Proof Note that the period of {sn } is 2k − 1. The state vectors

s0 , s1 , s2 , . . . , s2k −2

constitute all possible non-zero vectors of length k over F2 (see Section 11.5,
Exercise 11). Now, there are exactly

k  
 k
i
i
i=1

total occurrences of 1 in the listing of the state vectors s0 , . . . , s2k −2 .


An occurrence of 1 in the first 2k − 1 terms of the sequence {sn } determines an
occurrence of 1 in k consecutive state vectors. And so there are precisely
 k   k−1  
 k  k−1
i /k = = 2k−1
i i
i=1 i=0

occurrences of 1 in the first 2k − 1 terms of {sn }. 



Lemma 11.2.3 Let {sn } be a kth-order linearly recursive sequence in F2 with
primitive characteristic polynomial f (x). For convenience of notation, we write
s(n) = sn , n ≥ 0. Let rs = 2k − 1 denote the (maximal) period of s. Fix b, c ≥ 0
with gcd(c, rs ) = 1, and let s(b + ac), a ≥ 0, denote the subsequence of s. Then the
period of s(b + ac) is rs .
228 11 Key Generation

Proof Let A be the matrix of s. Then s(b + ac) is determined by the matrix Ac .
Now, the characteristic polynomial of A is f (x), and since f is primitive, there is a
root α of f (x) for which α = F2 (α)× = F× 2k
. Note |F×
2k
| = rs .
Let g(x) be the characteristic polynomial of Ac . We have deg(g(x)) = k. Since
gcd(c, rs ) = 1, α c is a root of g(x), which satisfies α c  = F× 2k
, hence g(x)
is a primitive polynomial over F2 (we could have f (x) = g(x), but this is not
necessary).
It follows that s(b + ac) is a k-order linearly recursive sequence with maximal
period 2k − 1 = rs .


Proposition 11.2.4 Let {sn } be a kth-order linearly recursive sequence in F2 with
characteristic polynomial f (x) (the selector sequence), and let {tn } be an lth-order
linearly recursive sequence in F2 with characteristic polynomial g(x). Assume both
f (x) and g(x) are primitive polynomials over F2 and that gcd(k, l) = 1 with k ≤ l.
Let {vm } be the shrinking generator sequence constructed from {sn } and {tn }. Then
{vm } has period 2k−1 (2l − 1).
Proof Note that the period of {sn } is 2k − 1 and the period of {tn } is 2l − 1. Let
rs = 2k − 1, rt = 2l − 1, and w = 2k−1 . We first note that gcd(k, l) implies that
gcd(rs , rt ) = 1 (see Section 11.5, Exercise 12). Note also that k ≤ l, and so, k ≤ rt .
From Lemma 11.2.2, there are w occurrences of 1 in the first rs terms of {sn }.
For the purposes of this proof, we let s(n) = sn , t (n) = tn , and v(m) = vm for
m, n ≥ 0.
By the definition of v, for each m ≥ 0,

v(m + aw) = t (nm + ars ),

for a ≥ 0. With a = rt ,

v(m + rt w) = t (nm + rt rs ) = t (nm ) = v(m),

for all m ≥ 0 and so, v is periodic with period rv ≤ wrt .


By the Division Algorithm,

wrt = rv q + r,

for integers q, r, 0 ≤ r < rv . Assume that r > 0. Then

v(m) = v(m + wrt ) = v(m + rv q + r) = v(m + r), ∀m,

which contradicts that rv is the period of v. Thus, rv | wrt .


We claim that wrt | rv , which will show that rv = wrt = 2k−1 2l − 1, and the
proof will be completed.
11.2 The Shrinking Generator Sequence 229

To this end, we have

v(m + aw) = v(m + rv + aw), ∀a

thus

t (nm + ars ) = t (nm+rv + ars ),

for all m, a ≥ 0. Thus,

t ((nm+rv − nm ) + nm + ars ) = t (nm + ars ),

for all m, a ≥ 0.
Now applying Lemma 11.2.3 to the sequence t with b = nm and c = rs , yields
that the period of the subsequence t (nm + ars )a≥0 is rt . Thus

rt | (nm+rv − nm ), ∀m. (11.3)

We claim that (11.3) implies that w | rv .


From (11.3), we have that for each m ≥ 0, there exists an integer dm for which

nm+rv = nm + dm rt . (11.4)

Thus

nm+1+rv = nm+1 + dm+1 rt . (11.5)

Subtracting (11.4) from (11.5) yields

nm+1+rv − nm+rv = nm+1 − nm + (dm+1 − dm )rt . (11.6)

If

nm+1+rv − nm+rv > nm+1 − nm ,

then

(dm+1 − dm )rt > 0,

and so

nm+1+rv − nm+rv > rt .

Consequently, the sequence s contains more than rt consecutive 0s and since k ≤ rt ,


this implies that s is the sequence of 0s, a contradiction.
230 11 Key Generation

If

nm+1+rv − nm+rv < nm+1 − nm ,

then

(dm+1 − dm )rt < 0,

and so

nm+1 − nm > rt ,

thus, the sequence s contains more than rt consecutive 0s, again a contradiction. We
conclude that

nm+rv +1 − nm+rv = nm+1 − nm .

This says that the subsequence s(i), i ≥ nm , is identical to the subsequence s(j ),
j ≥ nm+rv . Thus rs | (nm+rv − nm ) and so, the number of terms in s from s(nm ) to
s(nm+rv − 1) is a multiple of rs . It follows that the number of 1’s in s from s(nm ) to
s(nm+rv − 1) is a multiple of w. But the number of 1s is exactly rv , so

wc = rv , (11.7)

for some c. Now,

t (n0 ) = v(0) = v(rv ) = v(cw) = t (n0 + crs ). (11.8)

Next, let t (i) (n) denote the sequence

t (i) (n) = t (n + i)

for 0 ≤ i ≤ rt , n ≥ 0. The sequence t (i) is an lth-order linearly recursive sequence


with exactly the same primitive characteristic polynomial g(x) as t. Using the
formula (11.8), we obtain

t (n0 + i) = t (i) (n0 ) = v(0) = t (i) (n0 + crs ) = t ((n0 + i) + crs ).

It follows that rt divides crs . Since gcd(rs , rt ) = 1, this implies that rt divides c; we
have rt d = c. Now by (11.7),

rv = wrt d,

hence wrt divides rv . We conclude that the period of {vm } is 2k−1 (2l − 1).


11.3 Linear Complexity 231

As suggested by Proposition 11.2.4, a stream cipher in which the key stream is


given by a shrinking generator sequence is more secure than one in which the key
stream is generated by a linearly recursive sequence. See Section 11.3 in which we
compute the “linear complexity” of the shrinking generator sequence.
For a survey of other ways to combine linearly recursive sequences to obtain
cryptographically secure bit streams, see [56, Section 12.3].

11.3 Linear Complexity

Let {sn }n≥0 be an arbitrary sequence over F2 , i.e., {sn } is a sequence of bits. For
N ≥ 1, consider the first N terms of {sn }:
−1
{sn }N
n=0 = s0 , s1 , s2 , . . . , sN −1 .

−1
We ask: can {sn }N
n=0 be generated as the first N terms of a linearly recursive
sequence? The answer is “yes.” Let {sn } be an Nth-order (homogeneous) linearly
recursive sequence of bits with recurrence relation

sn+N = aN −1 sn+N −1 + aN −2 sn+N −2 + · · · + a1 sn+1 + a0 sn , n ≥ 0,

and initial state vector s0 = (s0 , s1 , s2 , . . . , sN −1 ). Then certainly, the first N terms
of {sn } are s0 , s1 , . . . , sN −1 .
So the question is now: what is the smallest integer L > 0 so that the first N
terms s0 , s1 , . . . , sN −1 can be generated as the first N terms of an Lth-order linearly
recursive sequence? That is, what is the smallest integer L > 0 for which the terms
can be generated by a recurrence relation

sn+L = aL−1 sn+L−1 + aL−2 sn+L−2 + · · · + a1 sn+1 + a0 sn ,

for 0 ≤ n ≤ N − L − 1?
Definition 11.3.1 For N ≥ 1, the Nth linear complexity of the sequence s = {sn }
is the length L of a shortest recurrence relation

sn+L = aL−1 sn+L−1 + aL−2 sn+L−2 + · · · + a1 sn+1 + a0 sn ,

0 ≤ n ≤ N − L − 1, that is satisfied by the first N terms of {sn }. We denote the N th


linear complexity of s as Ls,N .
Example 11.3.2 Let N ≥ 1 and let s = {sn } be so that sn = 0 for 0 ≤ n ≤ N − 1.
Then (by convention) we have Ls,N = 0.


232 11 Key Generation

Example 11.3.3 Let N ≥ 2 and let s = {sn } be so that sn = 0 for 0 ≤ n ≤ N − 2


and sN −1 = 1. Then we have Ls,N = N.


Example 11.3.4 Let {sn } be the 4th order linearly recursive sequence over F2 with
recurrence relation

sn+4 = sn+1 + sn , n≥0

and initial state vector s0 = 0001. We have

{sn } = 000100110101111000100110101111000 . . .

Thus Ls,1 = Ls,2 = Ls,3 = 0 and Ls,4 = 4. Moreover, Ls,N ≤ 4 for N ≥ 5 since
{sn } is a 4th-order linearly recursive sequence.
In fact, Ls,N = 4 for N ≥ 5. If Ls,N < 4 for some N ≥ 5, then s would begin
with N 0s.


We need an algorithm that will enable us to compute Ls,N precisely.
Lemma 11.3.5 Let {sn } be a sequence of bits. Then the first N terms of {sn } are
generated by a recurrence relation of length L if the system of equations


⎪ a0 s0 + a1 s1 + a2 s2 + · · · + aL−1 sL−1 = sL


⎨ 0 s1 + a1 s2 + a2 s3 + · · · + aL−1 sL
⎪ a = sL+1
a0 s2 + a1 s3 + a2 s4 + · · · + aL−1 sL+1 = sL+2

⎪ .. .. ..




. . .
a0 sN −L−1 + a1 sN −L + a2 sN −L+1 + · · · + aL−1 sN −2 = sN −1

has a solution a0 , a1 , a2 , . . . , aL−1 .


Proof This is clear.


Algorithm 11.3.6 (N_LIN_COMP)
Input: a sequence of N bits, s = s0 , s1 , . . . , sN −1
Output: Ls,N
Algorithm:
lower ← 0, upper ← N.
while (upper − lower) ≥ 2 do
Step 1. Compute L = (lower + upper)/2.
Use Lemma 11.3.5 to determine whether s is generated
by a recurrence relation of length L.
Step 2. If “yes” in Step 1, then set upper = L.
If “no” if Step 1, then set lower = L.
11.3 Linear Complexity 233

end-while
output upper = Ls,N



Example 11.3.7 Let s = 00010011 be the first 8 terms of the sequence of
Example 11.3.4. We employ Algorithm 11.3.6 with initial values lower = 0,
upper = 8: after the first iteration, we have L = 4, lower = 0, upper = 4; after the
second iteration, we have L = 2, lower = 2, upper = 4; after the third iteration, we
have L = 3, lower = 3, upper = 4; thus, Ls,8 = 4.


Example 11.3.8 Let s = 00000011. We employ Algorithm 11.3.6 with initial
values lower = 0, upper = 8: after the first iteration, we have L = 4, lower = 4,
upper = 8; after the second iteration, we have L = 6, lower = 6, upper = 8; after
the third iteration, we have L = 7, lower = 6, upper = 7; thus, Ls,8 = 7.


Definition 11.3.9 Let s = {sn }n≥0 be a sequence of bits. For N ≥ 1, let Ls,N be
the Nth linear complexity of s. Then

Ls = sup ({Ls,N })
N ≥1

is the linear complexity of s. Here supN ≥1 ({Ls,N }) denotes the least upper bound
of the set {Ls,N : N ≥ 1}, cf. [51, Definition 1.8].
For example, if s is the sequence of Example 11.3.4, then Ls = 4.
Theorem 11.3.10 Ls < ∞ if and only if {sn }n≥0 is a linearly recursive sequence
over F2 .
Proof Let s = {sn } be a sequence of bits. If {sn } is an lth-order linearly recursive
sequence, then Ls,N ≤ l for all N ≥ 1. Thus Ls ≤ l < ∞.
Conversely, if {sn } is not linearly recursive, then there is no finite set of bits
a0 , a1 , . . . , al−1 with

sn+l = al−1 sn+l−1 + al−2 sn+l−2 + · · · + a0 sn

for all n ≥ 0. Thus for any integer l ≥ 1, and any set of bits a0 , a1 . . . , al−1 , there
exists an integer N ≥ 0 so that

sN +l = al−1 sN +l−1 + al−2 sN +l−2 + · · · + a0 sN .

Hence Ls,N +l+1 > l. Thus Ls is not finite.




Theorem 11.3.11 Ls < ∞ if and only if {sn }n≥0 is eventually periodic.
234 11 Key Generation

Proof Let s = {sn } be a sequence of bits. We show that {sn } is eventually periodic if
and only if {sn } is linearly recursive. To this end, suppose {sn } is linearly recursive.
Then {sn } is eventually periodic by Proposition 11.1.6.
For the converse, suppose that {sn } is eventually periodic. Then there exist
integers N ≥ 0, t > 0 for which sn+t = sn , for all n ≥ N . It follows that {sn }
is an (N + t)th order linearly recursive sequence with recurrence relation

sn+N +t = sn+N ,

for n ≥ 0. 

We recall the shrinking generator sequence {vm } of Section 11.2, constructed
from the linearly recursive sequences {sn } and {tn }. The sequence {sn } is the selector
sequence and is kth-order with degree k primitive polynomial f (x); {tn } is lth-order
with degree l primitive polynomial g(x).
We assume gcd(k, l) = 1 and k ≤ l. The period of s is rs = 2k − 1, the period of
t is rt = 2l − 1 and w = 2k−1 , which is the number of 1s in the first 2k − 1 terms
of s.
We write sn = s(n), tn = t (n), and vm = v(m).
We obtain a lower bound on the linear complexity of {v(m)}.
Proposition 11.3.12 Let v = {v(m)} be a shrinking generator sequence defined by
sequences {s(n)}, {t (n)}. Then

Lv > 2k−2 l.

Proof By Proposition 11.2.4, v = {v(m)} has period wrt . Thus v is a linearly


recursive sequence and, as such, has matrix A with minimal polynomial m(x) ∈
F2 [x].
As in the proof of Proposition 11.2.4, we have

v(aw) = t (n0 + ars ), a ≥ 0, (11.9)

where n0 is the position of the first 1 in s.


Let B be the matrix of t. Since gcd(rs , rt ) = 1, the characteristic polynomial
of the matrix B rs is a primitive polynomial h(x) over F2 of degree l; we have
h(B rs ) = 0.
From (11.9), we conclude that the matrix Aw also satisfies the polynomial h(x).
Hence, m(x) divides h(x w ). Since we are working in characteristic 2, we conclude
that m(x) divides (h(x))w .
Since h is irreducible,

m(x) = (h(x))c
11.4 Pseudorandom Bit Generators 235

for some 1 ≤ c ≤ w = 2k−1 . We claim that c > 2k−2 . If c ≤ 2k−2 , then m(x)
k−2
divides (h(x))2 . By Proposition 7.3.9, h(x) divides 1 + x rt . Thus, m(x) divides
k−2 k−2
(1 + x rt )2 = 1 + x rt 2 .

But this says that the period of t is at most 2k−2 rt , which contradicts Proposi-
tion 11.2.4.
So the degree of m(x) is greater than 2k−2 l, hence Lt > 2k−2 l, as claimed.


In the special case that {vm } is constructed from the 1st order selector sequence
{sn }, sn = 1, n ≥ 0, and the lth-order sequence {tn }, we obtain

vm = tm ,

m ≥ 0, as one can easily check. In this case, Proposition 11.3.12 yields

Lv = Lt > l/2.

As a consequence of Proposition 11.3.12, Malice needs more than 2k−1 l


consecutive bits in the key stream to determine the recurrence relation for {tm }.
To see this, suppose that Malice needs ≤ 2k−1 l = 2 · 2k−2 l bits to determine a
linear recurrence for {tm }. Then the linear recurrence is of length ≤ 2k−2 l. Thus,
Lt ≤ 2k−2 l, which contradicts Proposition 11.3.12.
The number of bits that need to be captured to determine {tm } is an exponential
function of the order k of {rn } and a linear function of the order l of {sn }. This is
a significant improvement compared to the case where the key stream is generated
by a single linearly recursive sequence, in that case, the number of bits required is a
linear function of k.

11.4 Pseudorandom Bit Generators

In Section 11.1 we constructed a bit generator with a very long period using the
theory of lth order linearly recursive sequences. However, we showed that this
bit generator is cryptographically insecure since the sequence can be recovered
knowing a subsequence of length 2l. We improved this situation in Section 11.2
by introducing the shrinking generator sequence.
In this section we develop the tools to construct other cryptographically secure
bit generators.
Let l, m be integers with m & l ≥ 1, and let

G : {0, 1}l → {0, 1}m


236 11 Key Generation

be a bit generator with seed x ∈ {0, 1}l . We want to decide whether a bit generator
G produces strings that are pseudorandom or “as good as random.” To do this, we
introduce a test.
A bit generator is pseudorandom if it is not possible to distinguish its output
from a truly random bit stream with reasonable computing power (or by using an
algorithm that is practical and efficient).
Let y0 y1 . . . yi be a truly random sequence of bits. Given y0 y1 . . . yi−1 , if we just
guessed the value of the next bit yi , then we would be correct with probability 12 .
On the other hand, suppose that y0 y1 . . . yi is generated by G. Then G is
pseudorandom if given y0 y1 . . . yi−1 , there is no practical algorithm that will give
us a non-negligible advantage over merely guessing the value of the next bit yi .
More formally, we have the following test.
Definition 11.4.1 (Next-Bit Test) Let G : {0, 1}l → {0, 1}m be a bit generator. Let
x be a randomly chosen bit string in {0, 1}l , and let

G(x) = y0 y1 y2 . . . ym−2 ym−1 ∈ {0, 1}m .

Let i be an integer, 1 ≤ i ≤ m − 1. Let A be a probabilistic polynomial time


algorithm with input y0 y1 . . . yi−1 and output A(y0 y1 . . . yi−1 ) ∈ {0, 1}. Then G
passes the next-bit test if there exists a negligible function r : R+ → R+ and an
integer l0 so that

1
Pr(A(y0 y1 . . . yi−1 ) = yi ) ≤ + r(l)
2
for l ≥ l0 .
Definition 11.4.2 A bit generator G is a pseudorandom bit generator if G passes
the next-bit test.
Remark 11.4.3 The algorithm in the next-bit test is a probabilistic polynomial time
algorithm in the sense of Section 4.4. The algorithm attempts to solve the decision
problem:
D = given the bit stream y0 y1 . . . yi , decide whether the next bit yi+1 is equal to
1 (YES) or equal to 0 (NO)
At issue here is whether D ∈ PP. If the bit stream is pseudorandom, then D ∈ PP.



Let G : {0, 1}l → {0, 1}m be a bit generator. We define GR : {0, 1}l → {0, 1}m
to be the bit generator that reverses the bits of G; that is, GR (x) = yR =
ym−1 ym−2 . . . y1 y0 , where G(x) = y = y0 y1 . . . ym−1 .
Proposition 11.4.4 If G is a pseudorandom generator, then GR is a pseudorandom
generator.
11.4 Pseudorandom Bit Generators 237

Proof Suppose GR : {0, 1}l → {0, 1}m is not pseudorandom. Then there exists
a positive polynomial w(x), an integer i, 1 ≤ i ≤ m − 1, and a probabilistic
polynomial time algorithm A so that

1 1
Pr(A(ym−1 ym−2 . . . yi ) = yi−1 ) ≥ +
2 w(l)

for infinite l. Define an algorithm A as follows. Given y0 y1 . . . yi−2 ,

A (y0 y1 . . . yi−2 ) = A(ym−1 ym−2 . . . yi ).

Then A is a probabilistic polynomial time algorithm with

1 1
Pr(A (y0 y1 . . . yi−2 ) = yi−1 ) ≥ +
2 w(l)

for infinite l. Thus G is not pseudorandom, a contradiction.





11.4.1 Hard-Core Predicates

Let {0, 1}∗ denote the set of all sequences of 0’s and 1’s of finite length. A predicate
is a function

B : {0, 1}∗ → {0, 1}.

For example, for x ∈ {0, 1}∗ , B(x) = (|x| mod 2) is a predicate. Here |x| is the
length of x, example: B(10011) = 1. A predicate is a type of decision problem
(1 = YES, 0 = NO).
Definition 11.4.5 Let f : {0, 1}∗ → {0, 1}∗ be a function. A predicate B :
{0, 1}∗ → {0, 1} is a hard-core predicate for the function f if:
(i) There is a probabilistic polynomial time algorithm that computes B.
(ii) For any probabilistic polynomial time algorithm A with input f (x) and output
A(f (x)) ∈ {0, 1}, x ∈R {0, 1}l , there exists an integer l0 so that for l ≥ l0 ,

1
Pr (A(f (x)) = B(x)) ≤ + r(l),
2
where r is a negligible function.
Hard-core predicates are essentially unbiased, i.e., there is a negligible differ-
ence between Pr(B(x) = 0) and Pr(B(x) = 1).
238 11 Key Generation

Proposition 11.4.6 Let B : {0, 1}∗ → {0, 1} be a hard-core predicate for the
function f : {0, 1}∗ → {0, 1}∗ . Let x ∈R {0, 1}l . Then there exist an integer l0
and a negligible function r(x) so that

| Pr(B(x) = 0) − Pr(B(x) = 1)| ≤ r(l)

whenever l ≥ l0 .
Proof Suppose the condition of the proposition does not hold. Then there exists a
positive polynomial w(x) for which

1
| Pr(B(x) = 0) − Pr(B(x) = 1)| ≥
w(l)

for infinite l. We can assume without loss of generality that Pr(B(x) = 0) ≥


Pr(B(x) = 1), thus

1
Pr(B(x) = 0) − Pr(B(x) = 1) ≥
w(l)

for infinite l. Since Pr(B(x) = 1) = 1 − Pr(B(x) = 0),

1 1
Pr(B(x) = 0) ≥ + .
2 2w(l)

Let A be the polynomial time algorithm with A(f (x)) = 0 for all x ∈ {0, 1}l . Then

1 1
Pr(A(f (x)) = B(x)) ≥ + ,
2 2w(l)

for infinite l, and so B is not a hard-core predicate, a contradiction.





11.4.2 Hard-Core Predicates and the DLA

In this section we will apply the Discrete Logarithm Assumption (DLA), which was
stated in Section 9.4.
Let p be a random l-bit prime. We define two predicates on x ∈ {0, 1}l . The
predicate LEAST : {0, 1}l → {0, 1} is defined as

0 if x is even (as a decimal integer)
LEAST(x) =
1 otherwise.
11.4 Pseudorandom Bit Generators 239

The predicate MOST : {0, 1}l → {0, 1} is defined as



0 if x < (p − 1)/2
MOST(x) =
1 otherwise.

Lemma 11.4.7 Let p be an odd prime, let g be a primitive root modulo p, let
x ∈ U (Zp ), and let

y = DEXPp,g (x) = g x

denote the discrete exponential function. Then LEAST(x) = 0 if and only if y is a


quadratic residue modulo p.
Proof Suppose LEAST(x) = 0. Then x = 2m, hence y = g x = (g m )2 , and thus
y is a quadratic residue modulo p. Conversely, if y = g x is a quadratic residue
modulo p, then y = g x = (g m )2 for some m ∈ U (Zp ). Hence x = 2m, and so
LEAST(x) = 0. 

Are either of these predicates hard-core for some function, say DEXPp,g ? There
is certainly a polynomial time algorithm for computing LEAST, but LEAST is not
a hard-core predicate for f (x) = DEXPp,g (x).
Proposition 11.4.8 There is a polynomial time algorithm for computing LEAST(x)
given DEXPp,g (x).
Proof Let p be an odd prime, and let y = DEXPp,g (x) = g x for x ∈ U (Zp ). By
Lemma 11.4.7, LEAST(x) = 0 if and only if y is a quadratic residue modulo p,
hence by Proposition 6.4.5, LEAST(x) = 0 if and only if y (p−1)/2 ≡ 1 (mod p).
This is the basis for the following algorithm that runs in polynomial time:
Algorithm
Input: y = DEXPp,g (x)
Output: LEAST(x)
Algorithm:
r ← (y (p−1)/2 mod p)
if r = 1, then LEAST(x) = 0
else LEAST(x) = 1



On the other hand, MOST(x) is a hard-core predicate of DEXPp,g (x). To see
this, we first prove a lemma.
Lemma 11.4.9 Let p be a prime, and let b be a quadratic residue modulo p. Then
there exists a probabilistic polynomial time algorithm for computing the two square
roots of b modulo p.
240 11 Key Generation

Proof In the case that p ≡ 3 (mod 4) (p is Blum), this follows from Proposi-
tion 6.4.5. If p ≡ 1 (mod 4), use the result of E. Berlekamp [5].


Proposition 11.4.10 Under the Discrete Logarithm Assumption, MOST(x) is a
hard-core predicate of DEXPp,g (x).
Proof Suppose MOST(x) is not a hard-core predicate of DEXPp,g (x). Then we
show that the DLA cannot hold.
If MOST(x) is not hard-core, then, since Definition 11.4.5 (i) clearly holds, we
have that Definition 11.4.5 (ii) fails; that is, there exists a probabilistic polynomial
time algorithm A with input DEXPp,g (x) and output A(DEXPp,g (x)) ∈ {0, 1},
x ∈R {0, 1}l , and a positive polynomial w(x) for which
1 1
Pr(A(DEXPp,g (x)) = MOST(x)) ≥ +
2 w(l)
for infinitely many l.
To simplify the proof, we assume that A is a polynomial time algorithm that
always computes MOST

A(DEXPp,g (x)) = MOST(x)

for all x ∈R {0, 1}l and all l. This A will be used to devise a probabilistic polynomial
time algorithm A , which will compute DLOG(g x ) = x, thus contradicting the
DLA.
Here is the algorithm A .
Algorithm A :
Input: y = g x = DEXPp,g (x)
Output: x = DLOGp,g (y)

Round 1
Let y0 = y = (g x mod p), use the algorithm of Proposition 11.4.8 to compute
LEAST(x); let b0 = LEAST(x).
If b0 = 0 (x is even) let y1 = g x/2 , end of Round 1.
Else, if b0 = 1 (x is odd), compute (g x )/g = g x−1 , with x − 1 even. Thus
g x−1 is a quadratic residue modulo p and using the algorithm of Proposition 11.4.9,
compute the two square roots, r1 = g (x−1)/2 and r2 = g (x−1)/2+(p−1)/2 . Now using
A we obtain

A(r1 ) = MOST((x − 1)/2) = 0,

A(r2 ) = MOST((x − 1)/2 + (p − 1)/2) = 1.

Let y1 = r1 = g (x−1)/2 , end of Round 1.


So, after Round 1, either y1 = g x/2 or y1 = g (x−1)/2 .
11.4 Pseudorandom Bit Generators 241

Round 2
If y1 = (g x/2 mod p) (b0 = 0), use the algorithm of Proposition 11.4.8 to compute
b1 = LEAST(x/2). If b1 = 0 (x/2 is even) let y2 = g x/4 , end of Round 2.
Else, if b1 = 1 (x/2 is odd), compute (g x/2 )/g = g x/2−1 = g (x−2)/2 , with
(x − 2)/2 even. Thus g (x−2)/2 is a quadratic residue modulo p, and using the
algorithm of Proposition 11.4.9, compute the two square roots, r1 = g (x−2)/4 and
r2 = g (x−2)/4+(p−1)/2 . Now use A:

A(r1 ) = MOST((x − 2)/4) = 0,

A(r2 ) = MOST((x − 2)/4 + (p − 1)/2) = 1.

Let y2 = r1 = g (x−2)/4 , end of Round 2.


If y1 = g (x−1)/2 (b0 = 1), use the algorithm of Proposition 11.4.8 to compute
b1 = LEAST((x − 1)/2). If b1 = 0 ((x − 1)/2 is even) let y2 = g (x−1)/4 , end of
Round 2.
Else, if b1 = 1 ((x − 1)/2 is odd), compute (g (x−1)/2 )/g = g (x−1)/2−1 =
g (x−3)/2 , with (x − 3)/2 even. Thus g (x−3)/2 is a quadratic residue modulo p
and using the algorithm of Proposition 11.4.9, compute the two square roots,
r1 = g (x−3)/4 and r2 = g (x−3)/4+(p−1)/2 . Now use A:

A(r1 ) = MOST((x − 3)/4) = 0,

A(r2 ) = MOST((x − 3)/4 + (p − 1)/2) = 1.

Let y2 = r1 = g (x−3)/4 , end of Round 2.


So, after Round 2, we have one of the following:

y2 = g x/4 , y2 = g (x−2)/4 , y2 = g (x−1)/4 , y2 = g (x−3)/4 .

This terminates after m rounds with ym = 1 and with bm−1 bm−2 . . . b2 b1 b0 the
binary expansion of x = DLOGp,g (y).


Here is a numerical example of algorithm A .
Example 11.4.11 Let p = 7, a Mersenne prime, with primitive root g = 3. We
compute x = DLOG7,3 (6) using A .

Round 1
y0 = 6. By Proposition 11.4.8, LEAST(DLOG7,3 (6)) = 1 since 67 = −1, thus
b0 = 1.
Since b0 = 1, we compute 6/3 = 2, which is a quadratic residue modulo 7; the
two square roots of 2 are r1 = 3 and r2 = 4.
242 11 Key Generation

Now, we use A to obtain

A(3) = MOST(DLOG7,3 (3)) = MOST(1) = 0,

A(4) = MOST(DLOG7,3 (4)) = MOST(4) = 1.

Let y1 = 3, end of Round 1.


At the end of Round 1, y1 = 3 and b0 = 1.

Round 2
Since b0 = 1, use the algorithm of Proposition 11.4.8 to compute b1 =
LEAST(DLOG7,3 (3)) = LEAST(1) = 1. We compute 3/3 = 1, which is a
quadratic residue modulo 7; the two square roots of 1 are r1 = 1 and r2 = 6.
Now, we use A to obtain

A(1) = MOST(DLOG7,3 (1)) = MOST(0) = 0,

A(6) = MOST(DLOG7,3 (6)) = MOST(3) = 1.

Let y2 = 1, end of Round 2.


A terminates with b0 = 1, b1 = 1. Thus, DLOG7,3 (6) = 3.



11.4.3 The Blum–Micali Bit Generator

Definition 11.4.12 Let p be a random l-bit prime, and let g be a primitive root
modulo p. Let x be a randomly chosen element of U (Zp ). Let x0 = x and set

xi = (g xi−1 mod p),

for i ≥ 1. Let bi = MOST(xi ), i ≥ 0. Then the sequence {bi }i≥0 is the Blum–
Micali sequence with seed x.
Example 11.4.13 Take p = 31, a 5-bit prime, with g = 3. Let x = 11 be the seed.
Then

{xi } = 11, 13, 24, 2, 9, 29, 21, 15, 30, 1, 3, 27, 23, 11, 13, . . . ,
11.4 Pseudorandom Bit Generators 243

and

{bi } = 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, . . .


Example 11.4.14 Take p = 19, a 5-bit prime, with g = 2, (19)2 = 10011. Let
x = 5, (5)2 = 00101 be the seed. Then

{xi } = 5, 13, 3, 8, 9, 18, 1, 2, 4, 16, 5, 13, 3, . . .

and

{bi } = 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, . . .


Definition 11.4.15 Let x ∈R U (Zp ). The bit generator G : {0, 1}l → {0, 1}m
defined as

G(x) = b0 b1 b2 . . . bm−1 ,

where bi = MOST((g xi−1 mod p)), 0 ≤ i ≤ m − 1, is the Blum–Micali bit


generator.
In Example 11.4.13, l = 5, x = 11, (11)2 = 01011. We have G : {0, 1}5 →
{0, 1}m , with

G(01011) = 001001111001100 . . .

Proposition 11.4.16 Under the DLA, the Blum–Micali bit generator is pseudoran-
dom.
Proof Let G be the Blum–Micali generator with seed x0 = x. We show that GR is
pseudorandom, thus by Proposition 11.4.4, so is G.
By way of contradiction, suppose that GR is not pseudorandom; GR fails the
next-bit test. Then there exist a positive polynomial w, an integer i, 1 ≤ i ≤ m − 1,
and a probabilistic polynomial time algorithm A so that

1 1
Pr(A(bm−1 bm−2 . . . bi ) = bi−1 ) ≥ +
2 w(l)

for infinite l. Now,

bi = MOST(DEXPip,g (x0 ))

= MOST(DEXPp,g (DEXPi−1
p,g (x0 )))

= MOST(DEXPp,g (z)),
244 11 Key Generation

where z = DEXPi−1 p,g (x0 ), MOST(z) = bi−1 .


We define a new algorithm A as follows. Since DEXPp,g is a permutation, we
may consider z as a randomly chosen element of U (Zp ). On input DEXPp,g (z), the
output of A is A(bm−1 bm−2 . . . bi ). Thus,

Pr(A (DEXPp,g (z)) = MOST(z)) = Pr(A(bm−1 bm−2 . . . bi ) = bi−1 )


1 1
≥ + ,
2 w(l)

for infinite l. This says that MOST(x) is not a hard-core predicate for DEXPp,g (x),
which contradicts Proposition 11.4.10.


Proposition 11.4.17 The Blum–Micali sequence is periodic.
Proof The function DEXPp,g has a finite codomain U (Zp ). Thus the sequence x0

xi = (g xi−1 mod p)

for i ≥ 1 is periodic. Therefore so is the Blum–Micali sequence bi = MOST(xi ).




Example 11.4.18 Let {bi } = 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, . . . be the Blum–
Micali sequence of Example 11.4.14; DEXP19,2 (x) is a permutation of U (Z19 ) and
can be written in standard notation as DEXP19,2
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
= .
2 4 8 16 13 7 14 9 18 17 15 11 3 6 12 5 10 1

The cyclic decomposition of DEXP19,2 is

(1, 2, 4, 16, 5, 13, 3, 8, 9, 18)(6, 7, 14)(10, 17)(11, 15, 12).

Since the seed 5 is in the cycle (1, 2, 4, 16, 5, 13, 3, 8, 9, 18) of length 10, the Blum–
Micali sequence has period 10.


So here we have a pseudorandom sequence that is periodic! This is possible since
the period of a Blum–Micali sequence is “non-polynomial.”
Proposition 11.4.19 Let p be a random l-bit prime, let g be a primitive root modulo
p, and let x ∈R U (Zp ). Let w be any positive polynomial. Under the DLA, there
exists an integer l0 so that the Blum–Micali sequence {bi }i≥0 has period greater
than w(l) for l ≥ l0 .
11.4 Pseudorandom Bit Generators 245

Proof Suppose there exists a polynomial w so that {bi } has period ≤ w(l) for
infinite l. Then there exist a polynomial time algorithm A and an integer 0 ≤ i ≤
w(l) − 2 for which

Pr(A(b0 b1 . . . bi ) = bi+1 ) = 1.

In fact, A runs in time O(w(l)). Thus the Blum–Micali generator fails the next-
bit test, and so, the Blum–Micali bit generator is not pseudorandom. But then
Proposition 11.4.16 implies that the DLA cannot hold. 


11.4.4 The Quadratic Residue Assumption

Let p, q be distinct primes, and let n = pq. Define a function

DSQRn : U (Zn ) → U (Zn )

by the rule

DSQRn (x) = (x 2 mod n).

As in Section 6.4.1, we let QRn denote the set of quadratic residues modulo n,
i.e., those elements x ∈ U (Zn ) for which there exists y ∈ U (Zn ) with x ≡ y 2
(mod n).
There are φ(n) = (p − 1)(q − 1) elements in U (Zn ). Let
 x 
Jn(1) = x ∈ U (Zn ) : =1 ,
n
 x 
Jn(−1) = x ∈ U (Zn ) : = −1 .
n
(1) (−1) (1)
Then |Jn | = |Jn | = (p − 1)(q − 1)/2. We have QRn ⊆ Jn ; exactly half of
(1)
the elements of Jn are quadratic residues modulo n : |QRn | = (p − 1)(q − 1)/4.
(1)
Let f : Jn → {0, 1} be the function defined as

0 if x is a quadratic residue modulo n
f (x) =
1 otherwise.

The function f is unbiased in the sense that


 
 (p − 1)(q − 1)/4 (p − 1)(q − 1)/4 

| Pr(f (x) = 0) − Pr(f (x) = 1)| =  −  = 0.
(p − 1)(q − 1)/2 (p − 1)(q − 1)/2 
246 11 Key Generation

(1)
Given x ∈ Jn , it seems difficult to predict whether x is in QRn or not. If we
use a coin flip to guess whether x is a quadratic residue (i.e., we guess x ∈ QRn if
“heads” and x ∈ QRn if “tails”), then the probability of guessing correctly is 12 . If
we always guess that x is a quadratic residue, then the probability that we will be
correct is 12 . So the issue is the following: is there some efficient, practical algorithm
for guessing that will yield the correct result significantly more than half of the time
(i.e., with probability ≥ 12 + ,  > 0)?
We assume no such algorithm exists.
The Quadratic Residue Assumption (QRA) Let w(x) ∈ Z[x] be a positive
polynomial, let p, q be randomly chosen odd l-bit primes, and let n = pq. Let
(1)
x ∈R Jn . Let A be a probabilistic polynomial time algorithm with input x and
output A(x) ∈ {0, 1}. Then there exists an integer l0 for which

1 1
Pr(A(x) = f (x)) < +
2 w(l)

whenever l ≥ l0 .
The QRA says that f (x) is essentially unbiased (but we already knew that).
(1)
Proposition 11.4.20 For x ∈R Jn , there exist a negligible function r and an
integer l0 so that

| Pr(f (x) = 0) − Pr(f (x) = 1)| ≤ r(l)

whenever l ≥ l0 .
Proof Suppose no such r exists. Then as a function of l, | Pr(f (x) = 0)−Pr(f (x) =
1)| is not a negligible function. Hence, there exists a positive polynomial w(x) with

1
| Pr(f (x) = 0) − Pr(f (x) = 1)| ≥
w(l)

for infinite l. Without loss of generality, we can assume Pr(f (x) = 0) ≥ Pr(f (x) =
1), hence Pr(f (x) = 0) − Pr(f (x) = 1) ≥ w(l) 1
. Since Pr(f (x) = 1) = 1 −
Pr(f (x) = 0),

1 1
Pr(f (x) = 0) ≥ + ,
2 2w(l)

for infinite l. Note that 2q(x) is a positive polynomial. Now let A be the polynomial
(1)
time algorithm that satisfies A(x) = 0 for all x ∈ Jn . Then

1 1
Pr(A(x) = f (x)) ≥ + ,
2 2w(l)
11.4 Pseudorandom Bit Generators 247

for infinite l, which contradicts the QRA.




The QRA is equivalent to the Factoring Assumption (FA) of Section 9.3.
Proposition 11.4.21 The QRA holds if and only if the FA holds.
Proof See [8, Note added in proof] and [1]. 


11.4.5 The Blum–Blum–Shub Bit Generator

Clearly, DSQRn restricts to an function

DSQRn : QRn → QRn .

Lemma 11.4.22 Suppose p, q are distinct Blum primes, i.e., p, q ≡ 3 (mod 4).
Then the function DSQRn : QRn → QRn is a 1–1 correspondence.
Proof Suppose a is a quadratic residue modulo n. By Proposition 6.4.9, a has
exactly four square roots modulo n: x, −x, y, −y. By Proposition 6.4.10, exactly
one of them, say x, is in QRn . Thus DSQRn is onto. It follows that DSQRn is 1–1.


Proposition 11.4.23 Under the Quadratic Residue Assumption, LEAST(x) is a
hard-core predicate of DSQRn (x).
Proof Suppose LEAST(x) is not a hard-core predicate of DSQRn (x). Then we
show that the QRA cannot hold.
If LEAST(x) is not hard-core, then, since Definition 11.4.5(i) clearly holds, we
have that Definition 11.4.5(ii) fails, i.e., there exists a probabilistic polynomial time
algorithm A with input DSQRn (x), x ∈ QRn , and output A(DSQRn (x)) ∈ {0, 1}
and a positive polynomial w(x) for which

1 1
Pr(A(DSQRn (x)) = LEAST(x)) ≥ +
2 w(l)

for infinitely many l. This A will be used to devise a probabilistic polynomial time
algorithm A that exhibits an “-advantage” in guessing whether an element of Jn
(1)

is a quadratic residue or not. This will contradict the QRA.


Here is the algorithm A .
(1)
Let x ∈ Jn and put z = (x 2 mod n). Then z ∈ QRn and hence, by
Proposition 6.4.9, z has exactly four square roots modulo n: x, −x, y, −y. We
(1)
have x, −x ∈ Jn with LEAST(x) = LEAST(−x).
Next, we use A to (attempt to) compute LEAST(r), where r is the unique square
root (in QRn ) of z (Proposition 6.4.10). If LEAST(r) = LEAST(x), then x ∈ QRn ;
248 11 Key Generation

set A (x) = 0. If LEAST(r) = LEAST(−x), then x ∈ QRn ; set A (x) = 1. We


have

Pr(A(DSQRn (r)) = LEAST(r)) = Pr(A (x) = f (x)),

and so,

1 1
Pr(A (x) = f (x)) ≥ +
2 w(l)

for infinitely many l, contradicting the QRA.




Definition 11.4.24 Let p, q be random l-bit Blum primes, let n = pq, let x ∈R
U (Zn ), and let x0 = x 2 mod n. For i ≥ 1, let xi = xi−1
2 mod n, and let b =
i
LEAST(xi ). Then the sequence {bi }i≥0 is the Blum–Blum–Shub (BBS) sequence
with seed x.
Example 11.4.25 Suppose p = 19, q = 23, which are 5-bit Blum primes. Then
n = pq = 437. Choose x = 2 ∈ U (Z437 ). Then
{xi } = 4, 16, 256, 423, 196, 397, 289, 54, 294, 347, 234, 131, 118, 377, 104,
328, 82, 169, 156, 301, 142, 62, 348, 55, 403, 282, 427, 100, 386, 416, 4, 16,
256, 423, 196, 397, 289, 54, 294, 347, 234, 131, 118, 377, 104, 328, 82, 169,
156, 301, 142, 62, 348, 55, 403, 282, 427, 100, 386, 416, 4, 16, 256, 423,
196, 397, . . .
And so, the BBS sequence is
{bi } = 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 1, . . .


Here is a very simple GAP program to compute the first 100 terms of the BBS of
Example 11.4.25.
p:=2;
for i in [1..100] do;
p:=PowerMod(Integers,p,2,437);
Print(p mod 2,",");
od;
Example 11.4.26 Suppose p = 71, q = 127, which are 7-bit Blum primes. Then
n = pq = 9017. Choose x = 2019 ∈ U (Z9017 ). Then
{xi } = 677, 7479, 2990, 4253, 8924, 8649, 169, 1510, 7816, 8698, 2574,
6998, 677, 7479, 2990, 4253, 8924, 8649, 169, 1510, 7816, 8698, 2574, 6998,
677, 7479, 2990, 4253, 8924, 8649, 169, 1510, 7816, 8698, 2574, 6998, 677,
7479, 2990, . . .
11.4 Pseudorandom Bit Generators 249

And so the BBS sequence is


{bi } = 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, . . .


Definition 11.4.27 Let p, q be random l-bit Blum primes, let n = pq, and let
x ∈R U (Zn ). Let l  be the length of x. The bit generator

G : {0, 1}l → {0, 1}m ,

m > l  , defined as

G(x) = b0 b1 b2 . . . bm−1 ,

where bi = LEAST(xi ) is the Blum–Blum–Shub bit generator with seed x.


In Example 11.4.26, x = 2019, (2019)2 = 11111100011. We have G :
{0, 1}11 → {0, 1}20 , with G(11111100011) = 11010110000011010110.
Proposition 11.4.28 Under the QRA, the Blum–Blum–Shub bit generator is pseu-
dorandom.
Proof Let G be the Blum–Blum–Shub generator with seed x. We show that GR is
pseudorandom, thus by Proposition 11.4.4, so is G.
By way of contradiction, suppose that GR is not pseudorandom; GR fails the
next-bit test. Then there exist a positive polynomial w, an integer i, 0 ≤ i ≤ m − 2,
and a probabilistic polynomial time algorithm A so that

1 1
Pr(A(bm−1 bm−2 . . . bi+1 ) = bi ) ≥ +
2 w(l)

for infinite l. Now,

bi+1 = LEAST(DSQRi+1
n (x0 ))

= LEAST(DSQRn (DSQRin (x0 )))


= LEAST(DSQRn (z)),

where z = DSQRin (x0 ), LEAST(z) = bi .


We define a new algorithm A as follows. Since DSQRn is a permutation of QRn ,
we may consider z as a randomly chosen element of QRn . On input DSQRn (z), the
output of A is A(bm−1 bm−2 . . . bi+1 ). Thus,

Pr(A (DSQRn (z)) = LEAST(z)) = Pr(A(bm−1 bm−2 . . . bi+1 ) = bi )


1 1
≥ + ,
2 w(l)
250 11 Key Generation

for infinite l. This says that LEAST is not a hard-core predicate for DSQRn , a
contradiction.


Since QRn is finite, all BBS sequences are periodic. What is the period of the
BBS sequence? By inspection, we find that the period of the BBS sequence in
Example 11.4.25 is 12, and the period of the BBS sequence in Example 11.4.26
is 30.
Proposition 11.4.29 Let p, q be random l-bit Blum primes, let n = pq, and let
x ∈R U (Zn ). Let d be the order of x0 in U (Zn ), and write d = 2e m, where m is
odd. Then
(i) The period of the sequence {xi }i≥0 is the order r of 2 in U (Zm ).
(ii) The period of the BBS sequence {bi }i≥0 is less than or equal to r.
Proof
2 3
For (i): The sequence {xi } appears as x0 , x02 , x02 , x02 , . . . modulo n. So the period
of {xi } is the smallest r > 0 for which 2s+r ≡ 2s (mod d) for some s ≥ 1.
Write s = e + l for some integer l. Then 2e+l 2r ≡ 2e+l (mod d), or 2e+l 2r =
2e+l + (2e m)t, t ∈ Z, hence 2l 2r ≡ 2l (mod m). Since m is odd, 2l ∈ U (Zm ),
and so, 2r ≡ 1 (mod m). Thus r ≥ r  , where r  is the order of 2 in m. Suppose

r > r  . Then working the argument above in reverse, we obtain 2s+r ≡ 2s
(mod d) for some s ≥ 1, which contradicts the minimality of r. Thus r = r  .
For (ii): If the period of {bi } is r as in (i), then the period of {bi } is ≤ r.


Example 11.4.30 In Example 11.4.25, the order of 4 in U (Z19 ) is 18 and the
order of 4 in U (Z23 ) is 22. By Proposition 6.3.3, the order of 4 in U (Z437 ) is
d = lcm(18, 22) = (18 · 22)/ gcd(18, 22) = 198. Now, 198 = 2 · 99. The order of
2 in U (Z9 ) is φ(9) = 6. The order of 2 in U (Z11 ) is 10, and so, the order of 2 in
U (Z99 ) is lcm(6, 10) = 30, which is the period of both {xi } and {bi }.


How can we guarantee that the BBS sequence has a long period?
A prime p is a safe prime if p = 2p + 1 for some prime p . A 2-safe prime is
a safe prime p = 2p + 1 in which the prime p is safe prime, i.e., p = 2p + 1
for some prime p . For example, p = 11 = 2 · 5 + 1 is a safe prime and p = 23 =
2 · 11 + 1 is a 2-safe prime. Every 2-safe prime (and safe prime) is a Blum prime.
Proposition 11.4.31 Let p, q be random l-bit safe-2 primes, p = 2p + 1, p =
2p + 1, q = 2q  + 1, q  = 2q  + 1, for primes p , p , q  , q  . Let n = pq, let x0 be
a seed in Zn , with gcd(x0 , p) = gcd(x0 , q) = 1, x0 ≡ ±1 mod p, x0 ≡ ±1 mod q.
Let {xn }n≥0 be the BBS sequence given as xi = DSQRn (xi−1 ) for i ≥ 1. Then {xn }
has period at least p q  .
Proof Since gcd(x0 , p) = 1, then the order of x0 in U (Zp ) divides |U (Zp )| =
p − 1 = 2p . Since x0 ≡ ±1 mod p, then the order of x0 in U (Zp ) is either p or
11.5 Exercises 251

2p . Likewise, the order of x0 in U (Zq ) is either q  or 2q  . Thus by Proposition 6.3.3,


the order of x0 in U (Zn ) is either p q  or 2p q  .
By Proposition 11.4.29, the period of {x0 } is the order of 2 in U (Zp q  ). The order
of 2 in U (Zq  ) divides q  − 1 = 2q  , thus is either q  or 2q  . Likewise, the order of
2 in U (Zq  ) is either q  or 2q  . Thus the order of 2 in U (Zp q  ) is at least p q  . 

Remark 11.4.32 The function DSQRn : QRn → QRn that is iterated to create a
BBS sequence is a permutation, a bijection. Also, |QRn | = (p−1)(q−1)
4 as we have
seen in Proposition 6.4.8(ii). Now,
 
(p − 1)(q − 1)
O(|QRn |) = O = O(pq) = O(n).
4

In view of Proposition 11.4.31, there is a BBS sequence whose period is

O(p q  ) = O(p q  ) = O(pq) = O(n).

This BBS sequence has period on the order of the size of QRn . This is expected in
view of Proposition 9.3.10.



11.5 Exercises

1. Let {sn } be the sequence in Q given as

1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, . . . .

Show that {sn } is a linearly recursive sequence.


2. Let {sn } be the sequence in Q given as

1, 2, 3, 4, 5, 6, 7, 8, . . . .

Show that {sn } is a 3rd-order linearly recursive sequence.


3. Let {sn } be the sequence in Q given as:

3, 2, 5, 1, 7, 0, 0, 0, 0, 0 . . . .

Show that {sn } is a linearly recursive sequence.


4. Let {sn } be the sequence in Q given as:

1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, . . . .

Show that {sn } is not a linearly recursive sequence in Q.


252 11 Key Generation

5. Let {sn } be the 2nd-order linearly recursive sequence in R with recurrence


relation

sn+2 = 2sn+1 + sn

and initial state vector s0 = (−1, 2).


(a) Compute the matrix and the characteristic polynomial for {sn }.
(b) Find a formula for the nth term of the sequence.
6. Let K be a field, and let A be a matrix in Matl (K). Prove that the set of all
polynomials that annihilate A is a non-zero ideal of K[x].
7. Let F5 be the Galois field with 5 elements. Write out the first five terms of the
linearly recursive sequence {sn } with recurrence relation

sn+2 = 4sn+1 + sn ,

and initial state vector s0 = (2, 3). What is the period of {sn }?
8. Let F9 = F3 (α) be the Galois field with 9 elements given in Example 7.3.14.
Write out the first five terms of the linearly recursive sequence {sn } with
recurrence relation

sn+5 = sn+3 + sn+1 ,

and initial state vector s0 = (1, α, 2, 1, α). What is the period of {sn }?
9. Suppose that 01001101 is the first byte of a key stream generated by a 4th order
linearly recursive sequence {sn } over F2 .
(a) Find the recurrence relation and the characteristic polynomial of {sn }. What
is the period of {sn }?
(b) What can be said if an attacker obtains only the first six bits 010011 of the
sequence?
10. Let {sn } be a linearly recursive sequence over F2 with primitive characteristic
polynomial of degree l. Prove that Ls = l.
11. Let {sn } be a linearly recursive sequence over F2 with primitive characteristic
polynomial of degree l. Prove that the state vectors

s0 , s1 , s2 , . . . , s2l −2

constitute all possible non-zero vectors of length l over F2 .


12. Prove that gcd(k, l) = 1 if and only if gcd(2k − 1, 2l − 1) = 1.
13. Write a GAP program to compute the first 1000 terms of the shrinking generator
sequence where the selector sequence s has characteristic polynomial f (x) =
x 3 + x + 1 and initial state s0 = 110, and the sequence t has characteristic
polynomial g(x) = x 2 + x + 1 and initial state t0 = 10.
11.5 Exercises 253

14. The Thue–Morse sequence over F2 is the sequence s = {sn }, n ≥ 0, defined


as: sn = 0 if the number of 1s in the canonical base-2 representation of n is
even, and sn = 1 otherwise. Thus,

{sn } = 0110100110010110100101100, . . .

Let t = {tn } be a 4th-order linearly recursive sequence with recurrence relation


sn+4 = sn+1 + sn and initial state s0 = 1110.
(a) Using s as the selector sequence, find the first 10 terms of the shrinking
generator sequence v = {vm } constructed from s and t.
(b) It is known that s is not periodic. Is v periodic?
15. Let p(x) be a positive polynomial in Z[x] of degree d ≥ 2. Let s = {sn } be
the Thue–Morse sequence (as defined in Exercise 14). The subsequence tp =
{tn }n≥0 defined as

tn = sp(n) ,

n ≥ 0, is the Thue–Morse sequence along p(x).


Let Ltp ,N denote the Nth linear complexity of tp . Then P. Popoli [45,
Theorem 3] has shown that there exist a constant c > 0 (dependent of p(x))
and an integer N0 so that

Ltp ,N ≥ cN 1/d ,

whenever N ≥ N0 . Thus the linear complexity of tp is ∞.


The evidence is that tp has strong cryptographic properties.
(a) Use GAP to compute the first 1000 terms of tp where p(x) = x 2 .
(b) Use GAP to compute the first 1000 terms of tp where p(x) = x 4 + 3x 2 +
x + 5.
16. Let G : {0, 1}l → {0, 1}m be a bit generator defined by an lth order linearly
recursive sequence {sn }. Is G pseudorandom?
17. Let p = 977, a 10-bit prime, (977)2 = 1111010001. Let g = 3, and let x =
(29 )2 = 109 be the seed.
(a) Compute the sequence {xi } and the Blum–Micali bit sequence {bi }.
(b) Consider the associated stream cipher with {bi } as its key stream. Encrypt
M = 101010101010101010101.
18. Prove that every safe prime is a Blum prime.
19. Let p = 19, q = 23.
(a) Show that 19, 23 are 5-bit Blum primes.
(b) Compute the Blum–Blum–Shub sequence {bi } with seed x = 200 ∈
U (Z437 ).
(c) Compute the period of {bi }.
Chapter 12
Key Distribution

AES and other symmetric key cryptosystems are in wide use because they transmit
data much faster than public key cryptosystems (e.g., RSA), but they need a
shared secret key. Moreover, the Blum–Micali and Blum–Blum–Shub bit generators
require both Alice and Bob to share an initial key consisting of a finite string of
random bits.
In this chapter we introduce the Diffie–Hellman key exchange protocol, which is
a method to distribute keys (or other information) securely among the principals in
a network.

12.1 The Diffie–Hellman Key Exchange Protocol

Protocol 12.1.1 (Diffie–Hellman Key Exchange Protocol)


Premise: Alice and Bob share a large prime p and a primitive root g
modulo p.
Goal: Alice and Bob share a secret random element of Z× p.

1. Alice randomly chooses an integer x, 0 ≤ x ≤ p − 2. She computes h =


(g x mod p) and sends h to Bob.
2. Bob randomly chooses an integer y, 0 ≤ y ≤ p − 2. He computes k =
(g y mod p) and sends k to Alice.
3. Bob computes

(hy mod p) = ((g x )y mod p) = (g xy mod p).

© Springer Nature Switzerland AG 2022 255


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_12
256 12 Key Distribution

4. Alice computes

(k x mod p) = ((g y )x mod p) = (g yx mod p) = (g xy mod p).

5. Alice and Bob share the secret random integer (g xy mod p).


Example 12.1.2 Suppose Alice and Bob share prime p = 7057 and primitive root
5 modulo 7057.
Alice randomly chooses integer x = 365, computes

h = 3448 = (5365 mod 7057),

and sends 3448 to Bob.


Bob chooses y = 61, computes

k = 1822 = (561 mod 7057),

and sends 1822 to Alice.


Alice computes

(1822365 mod 7057) = 715.

Bob computes

(344861 mod 7057) = 715.

In binary, (715)2 = 1011001011, and so Alice and Bob share the secret random
string of bits 1011001011.



12.2 The Discrete Logarithm Problem

The standard attack on the Diffie–Hellman key exchange protocol (DHKEP) is to


find x given (g x mod p) and to find y given (g y mod p), i.e., the attack attempts to
compute the base-g logarithms of g x modulo p, and g y modulo p, respectively.
Once the attacker obtains x, y, he can then easily compute the shared secret
information (g xy mod p).
Finding the value of x knowing (g x mod p) is the Diffie–Hellman discrete
logarithm problem.
12.2 The Discrete Logarithm Problem 257

Definition 12.2.1 The Diffie–Hellman discrete logarithm problem (DHDLP) is


stated as follows: let g ∈ Z× p be a primitive root modulo p. Given an element
h ∈ Z×p , find the unique value of x, 0 ≤ x ≤ p − 2, for which

h = (g x mod p).

The Discrete Logarithm Assumption (DLA) (Section 9.4) says for any proba-
bilistic polynomial time algorithm A, the probability that A gives a solution to the
DHDLP

h = (g x mod p)

is negligible, i.e., it is a negligible function of l.


The DLA (if true) says that the DHDLP cannot be solved in polynomial time,
and this ensures the security of the DHKEP.
The DHDLP is a special case of the general discrete logarithm problem.
We recall some basic notions of group theory from Chapter 5.
Let G be a finite group, let g ∈ G, and let H = g be the cyclic subgroup
generated by g. The order of H equals the order of the element g; it is the smallest
positive integer n so that g n = 1 (1 denotes the identity element in G). Let h ∈ H .
Then

gx = h

for a unique x with 0 ≤ x ≤ n − 1.


Definition 12.2.2 The discrete logarithm problem (DLP) is stated as follows:
given g ∈ G, h ∈ H = g, n = |H |, find the unique value of x, 0 ≤ x ≤ n − 1, so
that the equation

gx = h

is satisfied.
Example 12.2.3 Take G = U (Z18 ), and let g = 5 ∈ U (Z18 ). Then 5 = H =
U (Z18 ). (In fact, U (Z18 ) is cyclic of order φ(18) = 6.) With h = 13 ∈ U (Z18 ), the
DLP is stated as follows: find the unique 0 ≤ x ≤ 5 so that

5x = 13.

As one can check, the solution to the DLP is x = 4.




258 12 Key Distribution

Example 12.2.4 In the group S4 , the solution to the DLP


 x  
0123 0123
=
2031 1302

is x = 3.


If the group G is additive, then the DLP seeks x so that

xg = g + g + · · · + g = h.
  
x

For example, given the cyclic additive group Z10 generated by g = 3, then with
h = 4, the DLP is

x · 3 = 3x = 4

in Z10 , equivalently, 3x ≡ 4 (mod 10). Since 3−1 = 7 in U (Z10 ), then x = ((7 ·


4) mod 10) = 8 solves the DLP.
For certain groups, the DLP is very easy to solve. For instance, suppose G is
the cyclic additive group Zn , generated by g with gcd(g, n) = 1, and let y ∈ Zn .
The corresponding DLP gx = y has solution x = g −1 y which can be found in
polynomial time O(m3 ), where m = log2 (n).
For other groups, the DLP is very hard to solve, for instance, the case G = Z×
p,
for a large prime p (this is the DHDLP).
In what follows, we review the most efficient (non-polynomial) algorithms for
solving the DHDLP. We begin with several methods for solving the general DLP,
i.e., the DLP for an arbitrary finite group G. These methods can then be applied to
the case G = Z× p.

12.2.1 The General DLP

Algorithm 12.2.5 (NAIVE_DLP)


Input: G, a cyclic group of order N, generated by g,
and an element h randomly chosen from G
Output: integer i, 0 ≤ i ≤ N − 1, with g i = h
Algorithm:
for i = 0 to N − 1 do
s ← gi
if s = h, then output i
next i
12.2 The Discrete Logarithm Problem 259

Clearly, Algorithm 12.2.5 solves the DLP in O(N) steps. However, the running
time is non-polynomial as a function of input as measured in bits. We can improve
on the efficiency of our solution to the DLP. We first prove a lemma.
Suppose G = g, N = |G|, and h ∈R G. Thus, there exists an integer x,
0 ≤ x ≤ N − 1, with g x = h; a solution to the DLP exists.

Lemma 12.2.6 Let n = 1 +  N and let x be an integer with 0 ≤ x ≤ N − 1.
Then there exist integers q and r with x = qn + r and 0 ≤ r < n, 0 ≤ q < n.
Proof Using the Division Algorithm, write

x = nq + r, 0 ≤ r < n,

for integers q and r. We show that q < n. Suppose that q ≥ n. Then


√ √
nq ≥ n2 = (1 +  N)2 > ( N)2 = N > x,

and thus r < 0, which is impossible. Thus q < n.


We claim that q ≥ 0. For if q < 0, then −q > 0, and hence n(−q) > n. But
then r + nq ≥ 0 implies r ≥ −nq = n(−q) > n, a contradiction.


Algorithm 12.2.7 (Baby-Step/Giant-Step (BSGS))
Input: G, a cyclic group of order N, generated by g,
and an element h randomly chosen from G
Output: integer i, 0 ≤ i ≤ N − 1, with g i = h
Algorithm:


Step 1. Let n = 1 +  N and construct two sets of group elements:

S1 = {e = g 0 , g = g 1 , g 2 , g 3 , . . . , g n−1 },

S2 = {h, hg −n , hg −2n , hg −3n , . . . , hg −(n−1)n }.

Step 2. Find an element hg −j n in S2 that matches an element g k in S1 . Now,


hg −j n = g k implies that g k+j n = h. Hence a solution is i = k + j n.


Lemma 12.2.6 guarantees that Step 2 of BSGS results in a match. The BSGS
algorithm is attributed to Shanks [53].
Proposition
√ 12.2.8 The Baby-Step/Giant-Step algorithm solves the DLP in
O( N log2 (N )) steps.
260 12 Key Distribution


We know that h = g x for some x, 0 ≤ x ≤ N − 1. Let n = 1 +  N. By
Lemma 12.2.6, there exist 0 ≤ q and r < n with x = qn + r. Thus h = g x =
g qn+r = g qn g r , and so g r = hg −qn . Since 0 ≤ q, r < n, g r ∈ S1 and hg −qn ∈ S2 .
So there is a match; qn + r is a solution to the DLP.
Regarding the time complexity of this solution, Step 1 can be completed in O(n)
steps. To find a match in Step 2, we compare each element of S2 to all elements of
S1 . We do this by sorting all 2n elements in S1 , S2 using a standard sorting algorithm
(like MERGE_SORT). Thus Step 2 is completed in O(n log2 (n)) steps. So the total
number of steps required to implement BSGS is

O(n) + O(n log(n)) = O(n log(n)).



Since n ≈ N , the time complexity of BSGS is therefore
√ √ √
O( N log2 ( N)) = O( N log2 (N )).



Of course, the BSGS algorithm has running time that is non-polynomial as
a function of input as measured in bits, but it is clearly more efficient than
Algorithm 12.2.5 (NAIVE_DLP).
Example 12.2.9 Let G = U (Z17 ); 3 is a primitive root modulo 17. We solve the
DLP

3x = 15

using the BSGS algorithm. In this case, N = 16, and so n = 1 +  16 = 5. Thus,

k S1 S2
0 30 ≡ 1 15 · 30 ≡ 15
1 31 ≡ 3 15 · 3−5 ≡ 3
2 32 ≡ 9 15 · 3−10 ≡ 4
3 33 ≡ 10 15 · 3−15 ≡ 11
4 34 ≡ 13 15 · 3−20 ≡ 9

Thus a match occurs with the pairing 31 and 15 · 3−5 . Thus (36 mod 17) = 15 and
so the DLP has solution i = 6.


Remark 12.2.10 In the BSGS algorithm, if there is exactly one match between
elements of S1 and S2 , it cannot occur for certain pairings. For instance, we cannot
have g n−1 = hg −(n−1)n as the only match between elements of S1 and S2 . For then
2 −1
g (n−1)+(n−1)n = g n = h.
12.2 The Discrete Logarithm Problem 261

The condition n2 > N then implies n2 − 1 ≥ N, which violates Lemma 12.2.6.




There is a probabilistic version of the BSGS algorithm based on the collision
theorem of Proposition 2.3.4.
Algorithm 12.2.11 (Probabilistic BSGS)
Input: G, a cyclic group of order N, generated by g,
and an element h randomly chosen from G
Output: integer i, 0 ≤ i ≤ N − 1, with g i = h
Algorithm:

Step 1. Choose an integer n, 1 ≤ n ≤ N, and compute the group elements

S1 = {e = g 0 , g = g 1 , g 2 , g 3 , . . . , g n−1 }.

Step 2. Choose an integer m ≥ 1, and let k1 , k2 , · · · , km be a randomly chosen


sequence of integers, 0 ≤ ki < N (with replacement). Then g k1 , g k2 , . . . , g km
can be considered as a sequence of terms chosen at random from G (with
replacement). Multiply each term g ki by h to form the sequence of group
elements

S2 = hg k1 , hg k2 , hg k3 , . . . , hg km .

Step 3. If a match between S1 and S2 occurs, i.e., hg kj = g i for some i, j , then


g i−kj = h and x = ((i − kj ) mod N) is a solution to the DLP g x = h.


The success of Algorithm 12.2.11 depends on whether a match occurs on Step 3.
We can calculate the probability of such a match.
Proposition 12.2.12 In Algorithm 12.2.11, the probability that at least one term of
sequence S2 matches some element of S1 is

n m
Pr(hg kj = g i for some i, j ) = 1 − 1 − .
N

Proof Since g generates G, the powers of g in S1 are distinct. Since the terms
g k1 , g k2 , . . . , g km are chosen at random from G, multiplying each term by h results
in a random sequence of terms because the map ψh : G → G defined as ψh (g) =
hg is a permutation of G. The result now follows from Proposition 2.3.4.


Example 12.2.13 We take G = Z× 7057 , with primitive root g = 5. Thus N =
|Z×
7057 | = 7056. Let h = 1000 ∈ G. We seek to solve the DHDLP

5x = 1000.
262 12 Key Distribution

In Step 1 of Algorithm 12.2.11, we take n = 60, so that

S1 = {1, 5, 52 , 53 , . . . , 559 }.

In Step 2, we seek the minimal m ≥ 1 so that it is likely that a match occurs


between S1 and

S2 = 1000(5k1 ), 1000(5k2 ), . . . , 1000(5km ).

Applying Proposition 12.2.12, we take m = 82, since m = 82 is minimal with


 
60 82
Pr(1000(5 ) = 5 for some i, j ) = 1 − 1 −
kj i
> 1/2.
7056

Thus with n = 60 and m = 82, it is likely that Algorithm 12.2.11 solves the
DHDLP.



12.2.2 Index Calculus

The index calculus algorithm is a method for solving the DLP in the case G =
U (Zp ) that is more efficient than BSGS. We assume that p is a large random prime
and g is a primitive root modulo p. Let h ∈ U (Zp ). We seek x, 0 ≤ x ≤ p − 2, for
which g x = h in U (Zp ).
We recall the notion of a smooth integer from Section 9.3.3. Let B ≥ 2 be a real
number. An integer m ≥ 2 is B-smooth if each prime factor of m is less than or
equal to B.
For n ≥ 2, let (n, B) be the number of B-smooth integers j with 2 ≤ j ≤ n.
For instance, (10, 3) = 4 since 2, 3, 4, 9 are the only 3-smooth integers with
2 ≤ j ≤ 10.
Algorithm 12.2.14 (Index Calculus)
Input: A large prime p, a primitive root g modulo p,
and an element h randomly chosen from U (Zp )
Output: integer i, 0 ≤ i ≤ p − 2, with g i = h
Algorithm:

Step 1. The first step in the index calculus is to choose a relatively small bound
B and then find more than π(B) residues (g i mod p) for which (g i mod p)
is B-smooth. These residues are randomly generated by choosing a random
sequence of integers m1 , m2 , m3 , . . . and checking to see whether each 2 ≤
(g mj mod p) ≤ p − 1 is B-smooth.
12.2 The Discrete Logarithm Problem 263

Step 2. Once π(B) such residues have been found, they form a system (re-
indexing the mj if necessary):
⎧ e e e

⎪ g m1 ≡ q11,1 q21,2 · · · qk 1,k (mod p)






⎪ e2,1 e2,2 e2,k
⎨ g ≡ q1 q2 · · · qk (mod p)
m2

(12.1)

⎪ ..




.




⎩ e e e
g mr ≡ q1r,1 q2r,2 · · · qk r,k (mod p)

for some k, r, with k ≤ π(B) ≤ r. Also, the primes satisfy 2 ≤ qb ≤ B and


ea,b ≥ 0 for all 1 ≤ a ≤ r, 1 ≤ b ≤ k.
Let DLOG = DLOGp,g . For each 1 ≤ i ≤ r,
ei,1 ei,2 e ei,1 ei,2 e
q2 ···qk i,k ) q2 ···qk i,k )
g mi g −DLOG(q1 ≡ g mi −DLOG(q1 ≡ 1 (mod p).

By Proposition 5.7.3,
e e e
mi − DLOG(q1i,1 q2i,2 · · · qk i,k )

is a multiple of the order of g in U (Zp ), which is p − 1. Using familiar laws of


logarithms,
e e e
DLOG(q1i,1 q2i,2 · · · qk i,k )
= e1,1 DLOG(q1 ) + e1,2 DLOG(q2 ) + · · · + e1,k DLOG(qk ),

thus the system (12.1) yields the r × k linear system, taken modulo p − 1:


⎪ e1,1 DLOG(q1 ) + e1,2 DLOG(q2 ) + · · · + e1,k DLOG(qk ) ≡ m1







⎨ e2,1 DLOG(q1 ) + e2,2 DLOG(q2 ) + · · · + e2,k DLOG(qk ) ≡ m2


⎪ ..




.





er,1 DLOG(q1 ) + er,2 DLOG(q2 ) + · · · + er,k DLOG(qk ) ≡ mr .

We think of DLOG(qt ), 1 ≤ t ≤ k, as variables. Since r ≥ k, there are at


least as many equations as variables and so the system has a solution in DLOG(qt ),
1 ≤ t ≤ k ≤ π(B).
264 12 Key Distribution

Step 3. We find an integer k for which (hg −k mod p) is B-smooth. For such k,

h ≡ g k q1e1 q2e2 · · · qkek (mod p)

for qt ≤ B, et ≥ 0. Thus

DLOG(h) ≡ k + e1 DLOG(q1 ) + e2 DLOG(q2 ) + · · · + ek DLOG(qk )

modulo p − 1. The discrete log DLOG(h) is then computed by substituting in


the values of DLOG(qt ) from Step 2.


Example 12.2.15 We take p = 997 and g = 7 and use Algorithm 12.2.14 to
compute DLOG997,7 (831), i.e., we find the unique x, 0 ≤ x ≤ 995 so that

7x = 831

in Z×
997 .
Step 1. We choose B = 3 so that π(3) = 2. We select random i, 0 ≤ i ≤ 995,
until we obtain three residues (7i mod 997) that are 3-smooth. The residues are


⎪ (7615 mod 997) = 23 · 32




(7231 mod 997) = 2 · 35





⎩ (715 mod 997) = 25 · 3.

Step 2. Applying DLOG yields




⎪ 3 · DLOG(2) + 2 · DLOG(3) ≡ 615 (mod 996)




DLOG(2) + 5 · DLOG(3) ≡ 231 (mod 996)





⎩ 5 · DLOG(2) + DLOG(3) ≡ 15 (mod 996).

The system has solution DLOG(2) = 201, DLOG(3) = 6.


Step 3. We find that

831 · 7−162 ≡ 23 · 34 (mod 997),


12.2 The Discrete Logarithm Problem 265

and thus

DLOG(831) = 162 + 3 · DLOG(2) + 4 · DLOG(3)


= 162 + 3 · 201 + 4 · 6
= 789.



12.2.3 Efficiency of Index Calculus

How efficient is the Index Calculus algorithm?


Due to the theorem of E. R. Canfield, P. Erdős and C. Pomerance (Theo-
rem 9.3.18), and its corollary (Corollary 9.3.19), we can complete Step 1 in a
reasonable (yet non-polynomial) amount of time.
Proposition 12.2.16 The Index Calculus algorithm solves the DLP in subexponen-
tial time
1/3 (log (log (p)))2/3 )
O(2c(log2 (p)) 2 2 ),

where c is a small constant.


Proof (Sketch) Let p be a large prime. The most time-consuming √part of the
algorithm is Step 1. By Corollary 9.3.19, in a random sequence of L(p) 2  integers
√1 √1
modulo p, we expect to find π(L(p) 2 ) integers that are L(p) 2 -smooth.
If an integer (i mod p) is chosen at random, then the integer (g i mod p) is also
(essentially) chosen at random since the map

DEXPp,g : U (Zp ) → U (Zp )



is a permutation. Therefore in a random sequence of L(p) 2  residues (g i mod p),
√1 √1
we expect to find π(L(p)

)√residues
2 (g i
mod p) that are L(p) -smooth. Thus 2
1/2 1/2
Step 1 takes time L(p) 2 = e 2(ln(p)) (ln(ln(p))) . Ultimately this can be reduced
1/3 2/3
to an overall running time of ea(ln(p)) (ln(ln(p))) where a is a small constant.
Changing base yields run time
1/3 (ln(ln(p)))2/3 1/3 (log (log (p)))2/3 )
ea(ln(p)) = O(2c(log2 (p)) 2 2 ),

where c is a small constant.




266 12 Key Distribution

In Example 12.2.15, p = 997. Thus

L(p) = L(997) ≈ 38.57451136,


√ √1
and so, L(997) 2  = 176, L(997) 2 ≈ 13.23377643, and π(13) = 6. In
view of Corollary 9.3.19, we need to choose 176 random residues (7i mod 997)
to expect 6 integers to be 13-smooth. Of course, p = 997 is not a large prime. In
Example 12.2.15 with some trial and error, we found 3 residues (7i mod 997) that
were 3-smooth.
Remark 12.2.17 Godušová [20] has extended the Index Calculus method to
U (Fpm ) = F×
pm .
We ask: Can we extend the Index Calculus method to arbitrary groups?
Suppose G is any finite cyclic group of order n generated by h, h = G. By a
theorem of Dirichlet, there exists a prime p of the form p = nk + 1 for some k [47,
Theorem 3.3]. Now,

U (Zp ) = U (Znk+1 ),

which is a cyclic group of order nk.


Let g be a primitive root modulo p, so that g = U (Zp ). Then U (Zp ) has a
cyclic subgroup H of order n generated by g k ; g k  = H . Moreover, there is a
group isomorphism

ψ :G→H

defined as h → g k . Thus G can be embedding into U (Zp ); G is essentially a cyclic


subgroup of U (Zp ) of order n.
So, given the DLP

hx = a

in G, we can apply the map ψ to yield the DLP in U (Zp ):

ψ(h)x = ψ(a),

or

g xk = b, (12.2)

with ψ(a) = b. The DLP (12.2) could then be solved using Index Calculus,
obtaining xk; dividing by k would then give x.
Unfortunately, there seems to be no easy way to find the required integer k so
that nk + 1 is prime, i.e., the embedding ψ is difficult to compute. Even if we were
able to find xk to solve the DLP (12.2), we need k to find x.
12.2 The Discrete Logarithm Problem 267

Moreover, for an arbitrary finite cyclic group G, there seems to be no obvious


way to compute the image ψ(a) = b without already knowing the discrete logarithm
of a in G.
So there seems to be no practical way to extend the Index Calculus method to
arbitrary groups using the notion of an embedding ψ : G → H ≤ U (Zp ).
For certain groups however, i.e., cyclic subgroups of the elliptic curve group
Ens (K), one can compute the embedding ψ and thus solve the DLP, see Sec-
tion 14.2.



12.2.4 The Man-in-the-Middle Attack

We close this chapter with another type of attack on the DHKEP; the “Man-in-the-
Middle” attack is a “network” attack on the DHKEP.
In the attack, the notation Malice (“Alice”) indicates that Malice is posing as
Alice, and the notation Malice (“Bob”) indicates that Malice is posing as Bob.
Attack 12.2.18 (Man-in-the-Middle Attack on DHKEP)
Premise: Alice and Bob share a large prime p and g, a primitive root
modulo p.
Result of
Attack: Alice and Bob think they are sharing a new element Z× p,
but actually, Alice is sharing a new element with Malice
and Bob and sharing a new element with Malice.

1. Alice randomly chooses an integer x, 0 ≤ x ≤ p − 2. She computes h =


(g x mod p) and sends h to Malice(“Bob”).

1 . Malice(“Alice”) randomly chooses an integer m, 0 ≤ m ≤ p − 2. He computes
l = (g m mod p) and sends l to Bob.
2. Bob randomly chooses an integer y, 0 ≤ y ≤ p − 2. He computes k =
(g y mod p) and sends k to Malice(“Alice”).

2 . Malice(“Bob”) sends l to Alice.
3. Alice computes (l x mod p) = ((g m )x mod p) = (g mx mod p).
4. Bob computes (l y mod p) = ((g m )y mod p) = (g my mod p).


After Step 3 in the attack, Alice and Malice share the integer (g mx mod p), and
after Step 4, Bob and Malice share (g my mod p) (Figure 12.1).
As a result of the Man-in-the-Middle attack, Malice will possess the residues
(g x mod p) and (g y mod p). His goal is to determine the residue (g xy mod p)
possessed by Alice and Bob. Can he use his knowledge of (g x mod p) and
(g y mod p) to determine (g xy mod p)?
Malice wants to solve the “computational DHDLP.”
268 12 Key Distribution

Fig. 12.1 Man-in-the-


Middle attack on DHKEP
Alice Bob

2' 2
1
1'
Malice

Definition 12.2.19 The Computational Diffie–Hellman discrete logarithm


problem (CDHDLP) is stated as follows: let g be a primitive root modulo p.
Given the residues (g x mod p), g y mod p), find the residue (g xy mod p).
It is clear that the CDHDLP is no harder than the DHDLP: if the discrete
logarithms x, y can be found, then (g xy mod p) can be easily computed. On the
other hand, if an efficient solution to the CDHDLP exists, then it is not known
whether this would imply an efficient solution to the DHDLP.

12.3 Exercises

1. Alice and Bob are using the DHKEP to exchange keys using the group G =
U (Z2861 ) and primitive root g = 2. Alice randomly chooses x = 623 and Bob
randomly chooses y = 14.
(a) Compute the key k shared by Alice and Bob. Write k as a 12-bit binary
number.
(b) Suppose Alice and Bob are using the Vernam cipher with message length 12
to communicate securely. Compute

C = e(100101111000, k).

2. Use the Baby-Step/Giant-Step algorithm to solve the DLP

2x = 887

in U (Z2861 ).
3. Find the minimal size of a prime p (in bits) to avert an attack on the DHKEP
using the BSGS algorithm.
4. Use Index Calculus to solve the DLP

2x = 100
12.3 Exercises 269

in U (Z2861 ). Hint: 21117 = 110, 2243 = 1485, and 21416 = 2711 are 3-smooth
residues in Z2861 .
5. Find the minimal size of a prime p (in bits) to safeguard against an attack on the
DHKEP using Index Calculus.
1/2 1/2
6. For an integer n ≥ 1, let L(n) = e(ln(n)) (ln(ln(n))) . Prove that L(n) =
1/2 1/2
O(2(log2 (n)) (log2 (log2 (n))) ).
7. Let p = 231 − 1 = 2147483647 be the Mersenne prime.
√1
(a) Use Exercise 6 to approximate B = L(231 − 1) 2 .
(b) Estimate the number of random integers modulo 231 − 1 that need to be
selected in order to find π(B) integers that are B-smooth.
8. Let U (Z100 ) be the units group of the residue ring Z100 .
(a) Show that 3 has order 20 in U (Z100 ).
(b) Prove that 3 ≤ U (Z100 ) can be embedded into the group of units U (Zp )
for some prime p.
(c) Use (a) and (b) to write the DLP 3x = 23 in 3 as a DLP in U (Zp ).
Chapter 13
Elliptic Curves in Cryptography

The Diffie–Hellman protocol uses the group U (Zp ) to exchange keys. Other groups
can be employed in a Diffie–Hellman-type protocol. For instance, we could use an
elliptic curve group.
In this chapter we introduce the elliptic curve group and show how it can be used
to strengthen the Diffie–Hellman key exchange protocol.

13.1 The Equation y 2 = x 3 + ax + b

Let K be a field. Let

x 3 + a2 x 2 + a1 x + a0 (13.1)

be a monic cubic polynomial over K. Assuming char(K) = 3, the transformation


a2
x = x − yields a somewhat simpler equation:
3
a2 3 a2 2 a2
x− + a2 x − + a1 x − + a0
3 3 3
   
a2 a22 a23 a2 a22
= x 3 − 3x 2 + 3x − + a2 x − 2x
2
+
3 9 27 3 9
a1 a2
+ a1 x − + a0
3
 
a2 2a 3 a1 a2
= x 3 + − 2 + a1 x + 2 − + a0 .
3 27 3

© Springer Nature Switzerland AG 2022 271


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_13
272 13 Elliptic Curves in Cryptography

a22 2a 3 a1 a2
So with a = − + a1 , b = 2 − + a0 , the cubic (13.1) can be written as
3 27 3

x 3 + ax + b.

Consider the equation

y 2 = x 3 + ax + b (13.2)

over K. The graph of equation (13.2) is defined as

{(x, y) ∈ K × K : y 2 = x 3 + ax + b}.

The graph can be viewed as the collection of zeros in K × K of the function

f (x, y) = y 2 − (x 3 + ax + b).

Let K be an algebraic closure of K (for instance, if K = R, then R = C). Let


(s, t) be a zero of the polynomial f (x, y) = y 2 − (x 3 + ax + b) in K × K. The
Taylor series expansion of f (x, y) about the point (s, t) is

∂f ∂f
f (x, y) = (s, t)(x − s) + (s, t)(y − t)
∂x ∂y
 
1 ∂ 2f ∂ 2f
+ (s, t)(x − s) + 2 (s, t)(y − t)
2 2
2! ∂x 2 ∂y
1 ∂ 3f
+ (s, t)(x − s)3
3! ∂x 3
= (3s 2 + a)(x − s) + 2t (y − t)
+ 3s(x − s)2 + (y − t)2 + (x − s)3 .

The linear form L of the Taylor series expansion is the first two terms of the
expansion, thus

∂f ∂f
L= (s, t)(x − s) + (s, t)(y − t)
∂x ∂y
= (3s 2 + a)(x − s) + 2t (y − t).

The tangent space to the graph of equation (13.2) at the point (s, t) is defined as

(s,t) = {(x, y) ∈ K × K : (3s 2 + a)(x − s) + 2t (y − t) = 0};

(s,t) is the graph of the equation L = 0 in K ×K, cf. [52, Chapter II, Sections 1.2–
1.5].
13.1 The Equation y 2 = x 3 + ax + b 273

The graph of (13.2) is smooth if the graph contains no points in K ×K for which
∂f ∂f
both partial derivatives and vanish simultaneously. In other words, the graph
∂x ∂y
is smooth if there is no point (s, t) ∈ K × K on the graph for which

⎪ ∂f
⎨ ∂x (s, t) = 0

⎩ ∂f (s, t) = 0.
∂y

A graph y 2 = x 3 + ax + b is singular if it is not smooth. If a graph is singular,


then there exists a point (s, t) ∈ K × K on the graph where both partials vanish,
and this is a singular point on the graph.
Proposition 13.1.1 The graph of y 2 = x 3 + ax + b is smooth if and only if the
dimension of the tangent space at every point of the graph is 1.
Proof Suppose the graph y 2 = x 3 + ax + b is not smooth. Then there exists a
singular point (s, t) on the graph. At this singular point (s, t), the linear part of the
Taylor series expansion of f (x, y) = y 2 − (x 2 + ax + b) is identically zero. Thus

2
(s,t) = {(x, y) ∈ K × K : (3s 2 + a)(x − s) + 2t (y − t) = 0} = K ,

and so dim((s,t) ) = 2 = 1.
For the converse, suppose that dim((s,t) ) = 1 at some point (s, t) on the graph.
Now, L = 0 is a linear equation in x and y and so its solutions have dimension 1 in
2
K unless the coefficients of x and y are both 0. It follows that both partials vanish
at (s, t), and so the graph is not smooth.


Proposition 13.1.2 If K is a field with char(K) = 2, then every curve of the form
y 2 = x 3 + ax + b is singular.
Proof Given y 2 = x 3 + ax + b over K, let α, β ∈ K with α 2 = −a, β 2 = b. Then

(β)2 = (α)3 + a(α) + b,


∂f
so that (α, β) is a point of y 2 = x 3 + ax + b in K × K with ∂x (α, β) = 3(α)2 + a =
−3a + a = −2a = 0 and ∂f ∂y (α, β) = 2β = 0.


Whether or not a graph y 2 = x 3 + ax + bis smooth depends entirely on the value
of its “elliptic” discriminant (not to be confused with the polynomial discriminant).
The elliptic discriminant of equation y 2 = x 3 + ax + b is given as

DE = −16(4a 3 + 27b2 ).
274 13 Elliptic Curves in Cryptography

Proposition 13.1.3 The graph of y 2 = x 3 + ax + b is smooth if and only if its


elliptic discriminant DE is non-zero.
Proof Suppose y 2 = x 3 + ax + b is not smooth. If char(K) = 2, then DE = 0.
So we assume that char(K) = 2. There exists a point (α, β) ∈ K × K with β 2 =
α 3 + aα + b, 2β = 0, and 3α 2 + a = 0. Thus

0 = 2β · β = 2β 2 = 2α 3 + 2aα + 2b,

and so α is a root of the polynomial 2x 3 + 2xa + 2b in K. Moreover, 6α 2 + 2a = 0


and so α is a root of multiplicity > 1. Consequently, α is a root of x 3 + ax + b of
multiplicity > 1. Thus by Ireland and Rosen [28, Chapter 19, Section 1, Lemma 1],
−4a 3 − 27b2 = 0, and so, DE = 0. Note also that the elliptic discriminant is the
ordinary polynomial discriminant of 2x 3 + 2ax + 2b, see [36, page 211].
Conversely, suppose DE = 0. If char(K) = 2, then y 2 = x 3 + ax + b is not
smooth by Proposition 13.1.2. So we assume that char(K) = 2. From DE = 0, we
obtain −4a 3 − 27b2 = 0, and so by Ireland and Rosen [28, Chapter 19, Section 1,
Lemma 1], x 3 + ax + b has a zero α in K of multiplicity > 1. Thus 3α 2 + a = 0.
The point (α, 0) ∈ K × K is a singular point of y 2 = x 3 + ax + b since (0)2 = 0 =
α 3 + aα + b with 2 · 0 = 0 and 3α 2 + a = 0. Thus y 2 = x 3 + ax + b is not smooth.


Example 13.1.4 Let K = R; an algebraic closure of R is C. Let y 2 = x 3 − x + 6.
The elliptic discriminant is DE = −16(4(−1)3 + 27(6)2 ) = −15488 = 0, and thus
the graph is smooth. The point (−2, 0) is on the graph. We compute the tangent
space

(−2,0) = {(x, y) : (3(−2)2 − 1)(x + 2) + 2(0)(y − 0) = 11(x + 2) = 0},

and thus (−2,0) is the graph of x +2 = 0. The curve and the tangent space restricted
to R × R are given in Figure 13.1.


Example 13.1.5 Let K = R, R = C, and y 2 = x 3 −3x+2. The elliptic discriminant
is DE = −16(4(−3)3 + 27(2)2 ) = 0. Thus the graph is singular. The point (1, 0)
is a singular point of the graph. We have (1,0) = C2 . The tangent space at (1, 0)
contains two lines tangent to the curve at (1, 0). They can be found by solving the
equation Q = 0, where
 
1 ∂ 2f ∂ 2f
Q= 2
(s, t)(x − s)2 + 2 (s, t)(y − t)2
2! ∂x ∂y

is the quadratic form of the expansion of f (x, y). We have

Q = −3(x − 1)2 + y 2
√ √
= (y − 3(x − 1))(y + 3(x − 1)).
13.1 The Equation y 2 = x 3 + ax + b 275

Fig. 13.1 Graph of


y 2 = x 3 − x + 6, with
1-dimensional tangent space
(−2,0)

Fig. 13.2 Graph of


y 2 = x 3 − 3x + 2, with
singular point (1, 0) and
tangent cone T(1,0)

√ √ √ √
Thus the tangent lines are y = 3x − 3 and y = − 3x + 3. The tangent cone
at the singular point (1, 0) is the collection of the tangent lines at (1, 0),
√ √ √ √
T(1,0) = {y = 3x − 3, y = − 3x + 3},

[52, Chapter II, Section 1.5]; see Figure 13.2.




276 13 Elliptic Curves in Cryptography

13.2 Elliptic Curves

Definition 13.2.1 Let K be a field. An elliptic curve over K is the graph of an


equation

y 2 = x 3 + ax + b (13.3)

over K which is smooth, together with a “point at infinity” O = (∞, ∞). We denote
an elliptic curve over K by E(K). Equivalently, an elliptic curve over K is defined
as

E(K) = {(x, y) ∈ K × K : y 2 = x 3 + ax + b} ∪ {O},

where DE = −16(4a 3 + 27b2 ) = 0; (13.3) is the Weierstrass equation.


According to our definition, there are no elliptic curves over fields of character-
istic 2, cf. Proposition 13.1.2. An elliptic curve in characteristic 2 can be defined
however as the graph of a generalized Weierstrass equation

y 2 + a1 xy + a3 y = x 3 + a2 x 2 + a4 x + a6 .

For example, the Koblitz curve

y 2 + xy = x 3 + 1

is smooth over F2 and defines an elliptic curve in characteristic 2. We will not


consider the characteristic 2 case in this book. The interested reader should see
[63, Chapter 2, Section 2.7], [25, Chapter 5, Section 5.7], and [32, Chapter VI] for
treatments of the characteristic 2 case.
Proposition 13.2.2 Let y 2 = x 3 + ax + b be an elliptic curve over K. Then the
cubic x 3 + ax + b has distinct zeros in an algebraic closure K of K.
Proof Since y 2 = x 3 + ax + b is an elliptic curve, DE = −16(4a 3 + 27b2 ) = 0,
and so D = −4a 3 − 27b2 = 0, where D is the ordinary polynomial discriminant of
the cubic.
Let r1 , r2 , and r2 be the zeros of x 3 + ax + b in K. By Ireland and Rosen [28,
Chapter 19, Section 1, Lemma 1],

D = (r1 − r2 )2 (r1 − r3 )2 (r2 − r3 )2 ,

and so the zeros of the cubic are distinct.




13.2 Elliptic Curves 277

Fig. 13.3 Elliptic curve over


R defined by y 2 = x 3 − 4x;
cubic has three real roots
r1 = −2, r2 = 0, and r3 = 2

Elliptic curves over R are the easiest to visualize. The equation y 2 = x 3 − x + 6


defines an elliptic curve E(R) over R whose graph is given in Figure 13.1. On the
other hand, y 2 = x 3 − 3x + 2 does not define an elliptic curve.
Given an elliptic curve y 2 = x 3 + ax + b over R, the cubic x 3 + ax + b has
three distinct real zeros if D > 0. If D < 0, then the cubic has one real zero and two
complex zeros [48, Proposition 4.60].
For example, the elliptic curve y 2 = x 3 − x + 6 has cubic
√ x − x + 6 with exactly
3

one real root r = −2, and the complex roots are 1 ± i 2. On the other hand, the
elliptic curve y 2 = x 3 − 4x over R with cubic x 3 − 4x has three real roots r1 = −2,
r2 = 0, and r3 = 2; the graph of E(R) is given in Figure 13.3.
Elliptic curves can be defined over finite fields of characteristic not 2.
Example 13.2.3 Let K = F5 . The graph of

y 2 = x 3 + 4x + 1

over F5 is an elliptic curve since the elliptic discriminant

DE = −16(4(43 ) + 27(1)2 ) = 2 = 0

in F5 . We have

E(F5 ) = {(x, y) ∈ F5 × F5 : y 2 = x 3 + 4x + 1} ∪ {O}.


278 13 Elliptic Curves in Cryptography

In this case, E(F5 ) contains a finite number of points, which can be easily found
using a table:

x x 3 + 4x + 1 y points
0 1 ±1 (0, 1), (0, 4)
1 1 ±1 (1, 1), (1, 4)
2 2 none none
3 0 0 (3, 0)
4 1 ±1 (4, 1), (4, 4)

There is no point in E(F5 ) with x-coordinate 2 since (23 + 4 · 2 + 1 mod 5) = 2


is not a quadratic residue mod 5, that is, 25 = −1. We have

E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.



13.3 Singular Curves

Let K be a field and let y 2 = x 3 + ax + b with DE = 0. By Proposition 13.1.3, the


graph of the equation is not smooth and hence does not define an elliptic curve over
K; the graph contains at least one singular point. Such curves are still of interest
however and will be discussed in detail in Chapter 14.
Assuming char(K) = 2, DE = 0 implies D = 0. Thus the cubic x 3 + ax + b
has either a double root or triple root. The root r with multiplicity > 1 gives rise to
a singular point (r, 0) on the graph of y 2 = x 3 + ax + b.
Example 13.3.1 Let K = R and y 2 = x 3 − 3x − 2. Then DE = D = 0 and
x 3 − 3x − 2 has a double root at x = −1. The only singular point on the graph is
(−1, 0), see Figure 13.4.


We let Ens (K) denote the set of all non-singular points on the curve y 2 = x 3 +
ax + b over K, together with O. For example, for the curve y 2 = x 3 − 3x − 2,

Ens (R) = {(x, y) : y 2 = x 3 − 3x − 2, (x, y) = (−1, 0)} ∪ {O}.

Here is an example over a finite field.


Example 13.3.2 Let K = F7 and y 2 = x 3 + 4x + 5. Then (6, 0) is the only singular
point and

Ens (F7 ) = {(2, 0), (3, 3), (3, 4), (4, 1), (4, 6), O}.


13.4 The Elliptic Curve Group 279

Fig. 13.4 Graph of


y 2 = x 3 − 3x − 2, singular
point is (−1, 0)

13.4 The Elliptic Curve Group

Let K be a field and let E(K) be an elliptic curve over K. We can put a group
structure on the points of E(K); the special element O will serve as the identity
element for the group. This group will be the “elliptic curve” group E(K). Since
elliptic curves over R are easier to visualize, we first show how the elliptic curve
group is constructed for the case K = R.
Let E(R) be an elliptic curve over R,

E(R) = {(x, y) ∈ R × R : y 2 = x 3 + ax + b} ∪ {O}.

We begin by defining a binary operation

+ : E(R) × E(R) → E(R)

on E(R). (As we shall see, the elliptic curve group is abelian, and thus we denote the
binary operation by “+”; the binary operation is not the component-wise addition
of the coordinates of the points.)
Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be points of E(R).
←→
Case 1. P1 = P2 , x1 = x2 . The line P1 P2 intersects E(R) at a point P  . Let P3
be the reflection of P  through the x-axis. We define

P1 + P2 = P3 .
280 13 Elliptic Curves in Cryptography

←→
Case 2. P1 = P2 , x1 = x2 , y1 = y2 . In this case, the line P1 P2 is vertical and
intersects E(R) at the point at infinity O. The reflection of O through the x-axis
is again O. So we define

P1 + P2 = O.

Case 3. P1 = P2 , y1 = y2 = 0. Since P1 is a non-singular point of y 2 = x 3 +


ax + b, the dimension of the tangent space P1 is 1. Hence there is a unique
tangent line to the curve at the point P1 . The tangent line intersects the curve at
a point P  ; reflecting P1 through the x-axis yields a point P3 and we define

P1 + P2 = P1 + P1 = 2P1 = P3 .

Case 4. P1 = P2 , y1 = y2 = 0. As in Case 3, there is a unique tangent line to the


curve at the point P1 , and in this case the tangent line is vertical and intersects
the curve at O; reflecting O through the x-axis yields O and we define

P1 + P2 = P1 + P1 = 2P1 = O.
←→
Case 5. P2 = O. As in Case 2, the line P1 O is vertical and intersects E(R) at
a point P  which is the reflection of P1 through the x-axis. Reflecting P  back
through the x-axis yields P1 , so we define

P1 + O = P1 .

In the manner described above, we define a binary operation on E(R). Case 1 is


illustrated in Figure 13.5.

We can derive explicit formulas for the addition of points in E(R).

Fig. 13.5 Case 1: addition of


points, P1 + P2 = P3
13.4 The Elliptic Curve Group 281

Proposition 13.4.1 (Binary Operation on E(R)) Let E(R) be an elliptic curve


and let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be points of E(R). There exists a binary
operation on E(R) defined as follows:
(i) If P1 = P2 and x1 = x2 , then

P1 + P2 = P3 = (x3 , y3 ),

where

x3 = m2 − x1 − x2 and y3 = m(x1 − x3 ) − y1 .

y2 − y1
with m = .
x2 − x1
(ii) If P1 =
 P2 and x1 = x2 , y1 =
 y2 , then

P1 + P2 = O.

(iii) If P1 = P2 and y1 = y2 = 0, then

P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),

where

x3 = m2 − 2x1 and y3 = m(x1 − x3 ) − y1

3x12 + a
with m = .
2y1
(iv) If P1 = P2 and y1 = y2 = 0, then

P1 + P2 = P1 + P1 = 2P1 = O.

(v) If P2 = O, then

P1 + O = P1 .

Proof
←→
(i) P1 = P2 , x1 = x2 . In this case, the line P1 P2 has equation

y2 − y1
y − y1 = m(x − x1 ), m= .
x2 − x1

We seek the intersection of this line with E(R). We find x so that

(m(x − x1 ) + y1 )2 = x 3 + ax + b. (13.4)
282 13 Elliptic Curves in Cryptography

By construction, x1 and x2 are solutions to (13.4). The third solution is obtained


by rewriting Equation (13.4) as

x 3 − x 2 (m2 ) + x(2m2 − 2my1 + a) − m2 x12 + 2mx1 y1 − y12 + b = 0.

Thus x3 = m2 − x1 − x2 is the x-coordinate of the third point P  of intersection


←→
of P1 P2 with E(R). The y-coordinate of P  is therefore y  = m(x3 − x1 ) + y1 .
The reflection of P  through the x-axis is the point P3 = (x3 , y3 ), where y3 =
m(x1 − x3 ) − y1 . Thus, in Case 1,

P1 + P2 = P3 = (x3 , y3 ),

where

x3 = m2 − x1 − x2 and y3 = m(x1 − x3 ) − y1

y2 − y1
with m = .
x2 − x1
←→
(ii) P1 = P2 , x1 = x2 , y1 = y2 . In this case, the line P1 P2 is vertical and intersects
E(R) at the point at infinity O. The reflection of O through the x-axis is again
O. So we define

P1 + P2 = O.

(iii) P1 = P2 , y1 = y2 = 0. Since P1 is a non-singular point of y 2 = x 3 + ax + b,


the dimension of the tangent space P1 is 1. Hence there is a unique tangent
line to the curve at the point P1 . By implicit differentiation, the slope of the
3x12 +a
tangent line is m = 2y1 , and thus the equation for the tangent line is

y − y1 = m(x − x1 ).

We seek the intersection of this line with E(R). We find x so that

(m(x − x1 ) + y1 )2 = x 3 + ax + b. (13.5)

Clearly, x1 is a solution. Taking derivatives yields

2m (m(x − x1 ) + y1 ) = 3x 2 + a,

and so x1 is a zero of (13.5) of multiplicity ≥ 2.


The coefficient of −x 2 in the expansion of (13.5) is m2 . And so, x3 = m2 −
2x1 is the x-coordinate of the second point P  of intersection of the tangent
line with E(R). The y-coordinate of P  is therefore y  = m(x3 − x1 ) + y1 .
13.4 The Elliptic Curve Group 283

The reflection of P  through the x-axis is the point P3 = (x3 , y3 ) where y3 =


m(x1 − x3 ) − y1 . Thus,

P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),

where

x3 = m2 − 2x1 and y3 = m(x1 − x3 ) − y1 ,

3x12 + a
with m = .
2y1
(iv) P1 = P2 , y1 = y2 = 0. As in (iii), there is a unique tangent line to the curve
at the point P1 , and in this case the tangent line is vertical and intersects the
curve at O; reflecting O through the x-axis yields O and we define

P1 + P2 = P1 + P1 = 2P1 = O.
←→
(v) P2 = O. As in (ii), the line P1 O is vertical and intersects E(R) at a point P 
which is the reflection of P1 through the x-axis. Reflecting P  back through
the x-axis yields P1 , so we define

P1 + O = P1 .


Example 13.4.2 Let =
y2 − x + 6 be the elliptic curve defined
x3 over R. Then
√ √
P1 = (−2, 0) and P2 = (0, 6) are points of E(R). We have m = 2 and so
6

 √    √ 
√ 3 6 7 7 −11 6
(−2, 0) + (0, 6) = + 2, −2 − = , .
2 2 2 2 4

Moreover,

2P1 = 2(−2, 0) = O,

and
 √ 
√ 1 −287 6
2P2 = 2(0, 6) = , .
24 288


The good news is that the formulas of Proposition 13.4.1 extend to any field K
to define a binary operation on an elliptic curve E(K).
Example 13.4.3 Let K = Q. Then y 2 = x 3 − 4x = x(x + 2)(x − 2) defines an
elliptic curve E(Q). It is easy to see that (0, 0), (2, 0), (−2, 0), and O are points
284 13 Elliptic Curves in Cryptography

of E(Q). In fact, by Washington [63, Chapter 8, Example 8.5], these are the only
points of E(Q). We have

(0, 0) + (−2, 0) = (2, 0),

2(2, 0) = O,

and

(0, 0) + (2, 0) = (−2, 0).




Example 13.4.4 Let K = F5 . Then =
y2 x3 + 4x + 1 defines an elliptic curve over
F5 . As we have seen in Example 13.2.3,

E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.

We have

(3, 0) + (4, 1) = (4, 4),

and

2(1, 1) = (4, 1).




Proposition 13.4.5 (Elliptic Curve Group) Let K be a field and let E(K) be an
elliptic curve over K. Then E(K) is an abelian group under the binary operation
defined by the formulas of Proposition 13.4.1.
Proof To show that E(K) is a group, we check that the conditions of Defini-
tion 5.1.1 are satisfied. To this end, we claim that the binary operation given in
Proposition 13.4.1 is associative. This can be verified directly, but the calculation
is lengthy. L. C. Washington proves the associative property using projective space,
see [63, Chapter 2, Section 2.4].
For an identity element, we take O. By Proposition 13.4.1(v),

O +P =P =P +O

for all P ∈ E(K), and so O serves as a left and right identity element in E(K).
Let P = (x, y) ∈ E(K). Then the point P  = (x, −y) is on the curve E(K). By
Proposition 13.4.1(ii),

P + P = O = P + P
13.4 The Elliptic Curve Group 285

and so there exists a left and right inverse element P  for P , which we denote by
−P .
Thus E(K) is a group. It is straightforward to check that the binary operation is
commutative, and thus E(K) is an abelian group.



13.4.1 Structure of E(K)

The structure of the group E(K) depends on the field K. We consider the case where
K = Q or where K is a finite field.
Theorem 13.4.6 (Mordell–Weil Theorem) Let E(Q) be an elliptic curve group
over Q. Then E(Q) is a finitely generated abelian group. Thus

E(Q) ∼
= Zpe1 × Zpe2 × · · · × Zprer × Zt ,
1 2

where pi are primes not necessarily distinct, ei ≥ 1, and t ≥ 0.


Proof The proof is beyond the scope of this book. The interested reader is referred
to [63, Chapter 8, Section 8.3, Theorem 8.17]. 

Example 13.4.7 Let E(Q) be the elliptic curve group defined by y 2 = x 3 − 4x =
x(x + 2)(x − 2). As we have seen in Example 13.4.3,

E(Q) = {(0, 0), (2, 0), (−2, 0), O},

and thus, E(Q) is a finitely generated abelian group. In fact,

E(Q) ∼
= Z2 × Z 2 .


Example 13.4.8 Let E(Q) be the elliptic curve group defined by y 2 = x 3 − 25x =
x(x + 5)(x − 5). By the Mordell–Weil theorem, E(Q) is a finitely generated abelian
group. In fact, as shown in [63, Chapter 8, Section 8.4],

E(Q) ∼
= Z2 × Z2 × Z.


In cryptography, we are mainly interested in the case where K is a finite field Fq
with q = pn elements for p prime and n ≥ 1. If K = Fq , then E(Fq ) is a finite
abelian group. We review (without proofs) two fundamental results on the structure
of E(Fq ).
Theorem 13.4.9 Let E(Fq ) be an elliptic curve group over Fq . Then

E(Fq ) ∼
= Zn
286 13 Elliptic Curves in Cryptography

for some integer n ≥ 1, or

E(Fq ) ∼
= Zm × Zn ,

for some integers m, n ≥ 1 with m | n.


Proof For a proof, see [63, Chapter 4, Section 4.1, Theorem 4.1]. 

A second fundamental result due to Hasse gives an upper and lower bound on
the number of elements in E(Fq ).
Theorem 13.4.10 (Hasse) Let E(Fq ) be an elliptic curve group over Fq . Then
√ √
q + 1 − 2 q ≤ |E(Fq )| ≤ q + 1 + 2 q.

Example 13.4.11 Let K = F5 . Then y 2 = x 3 + 4x + 1 defines an elliptic curve


over F5 ,

E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.

According to Hasse’s theorem,


√ √
6 − 2 5 ≤ |E(F5 )| ≤ 6 + 2 5,

which holds since |E(F5 )| = 8.


By Theorem 13.4.9, either E(F5 ) ∼
= Z8 or E(F5 ) ∼
= Z2 × Z4 . Since (0, 1) has
order 8, we conclude that

E(F5 ) ∼
= (0, 1) ∼
= Z8 .


Example 13.4.12 Let K = F7 . Then y 2 = x 3 + 2 defines an elliptic curve over F7
with group

E(F7 ) = {(0, 3), (0, 4), (3, 1), (3, 6), (5, 1), (5, 6), (6, 1), (6, 6), O}.

According to Hasse’s theorem,


√ √
8 − 2 7 ≤ |E(F7 )| ≤ 8 + 2 7,

which holds since |E(F7 )| = 9.


By Theorem 13.4.9, either E(Z7 ) ∼
= Z9 or E(F7 ) ∼
= Z3 × Z3 . But since 3P = O
for all P ∈ E(F7 ), we have

E(F7 ) ∼
= Z3 × Z3 .


13.5 The Elliptic Curve Key Exchange Protocol 287

13.5 The Elliptic Curve Key Exchange Protocol

The Elliptic Curve Key Exchange Protocol (ECKEP) is essentially the DHKEP with
an elliptic curve group in place of the group U (Zp ).
Protocol 13.5.1 (Elliptic Curve Key Exchange Protocol)
Premise: Alice and Bob choose an elliptic curve group E(Fp ) over a finite
field Fp , where p is a large prime. Alice and Bob share a point
P ∈ E(Fp ) with a large order r.
Goal: Alice and Bob share a secret random point of E(Fp ).

1. Alice randomly chooses an integer m, 0 ≤ m ≤ r − 1. She computes

Q = mP

and sends Q to Bob.


2. Bob randomly chooses an integer n, 0 ≤ n ≤ r − 1. He computes

R = nP

and sends R to Alice.


3. Bob computes

S = nQ = n(mP ) = nmP .

4. Alice computes

T = mR = m(nP ) = mnP = nmP .

5. Alice and Bob share the secret random point S = mnP .




Example 13.5.2 Alice and Bob choose the elliptic curve group E(F5 ) defined by
y 2 = x 3 + x + 1. In fact, E(F5 ) ∼
= Z9 and is generated by the point P = (0, 1)
(Section 13.6, Exercise 9).
Alice and Bob decide to share the point P = (0, 1), which has order r = 9.
Alice randomly chooses integer m = 3 and computes

Q = 3P = 3(0, 1) = (2, 1)

and sends Q to Bob.


Bob randomly chooses integer n = 5, computes

R = 5P = 5(0, 1) = (3, 1),


288 13 Elliptic Curves in Cryptography

and sends R to Alice.


Bob computes

5Q = 5(2, 1) = 15(0, 1) = 6(0, 1) = (2, 4),

and Alice computes

3R = 3(3, 1) = 15(0, 1) = 6(0, 1) = (2, 4).




We show that point multiplication nP for n ≥ 1 can be computed efficiently.
This will show that the ECKEP is a practical method of key exchange.
Algorithm 13.5.3 (POINT_MULT)
Input: an elliptic curve group E(Fp ) and a point P ∈ E(Fp ),
n encoded in binary as bm bm−1 · · · b1
Output: Q = nP
Algorithm:
Q←O
for i = 1 to m
if bi = 1 then Q ← Q + P
P ← 2P
next i
output Q


To see how POINT_MULT works, we compute

6P = 6(0, 1)

in the elliptic curve group E(F5 ) of Example 13.4.11. In this case, P = (0, 1),
n = 6, which in binary is 110, and so m = 3.
On the first iteration of the loop, Q remains O, and P = (0, 1) doubles to become

P = 2(0, 1) = (4, 1).

On the second iteration, Q becomes

Q + P = O + P = (4, 1),

and P doubles again as

P = 2(4, 1) = (3, 0).

On the third (last) iteration, Q becomes

Q + P = (4, 1) + (3, 0) = (4, 4),


13.5 The Elliptic Curve Key Exchange Protocol 289

which is then output as the correct result

Q = 6P = (4, 4).

Proposition 13.5.4 POINT_MULT runs in time O(m4 ), where m = log2 (p)+1.


Proof The algorithm performs m iterations of the for-next loop. On each iteration,
the algorithm performs at most 2 steps; each step is an addition of two points of
E(Fp ). An examination of the formulas for point addition (Proposition 13.4.1)
reveals that addition of points in E(Fp ) requires at most 20 additions, subtractions,
multiplications, and divisions in Fp . Thus each iteration costs O(m3 ) steps.
Consequently, the running time of POINT_MULT is O(m4 ).


Remark 13.5.5 Similar to Algorithms 6.4.1, Algorithm 13.5.3 is analogous to
Algorithm 4.2.6 with 2P playing the role of 2a.


There is an analog for the DHDLP:
Definition 13.5.6 The Elliptic Curve Discrete Logarithm Problem (ECDLP) is
stated as follows: given P ∈ E(Fp ), Q ∈ H = P  ≤ E(Fp ), n = |H |, find the
unique value of m, 0 ≤ m ≤ n − 1 for which

Q = mP .

Like the DHKEP, the security of the ECKEP depends on the fact that the ECDLP
is difficult to solve.

13.5.1 Comparing ECKEP and DHKEP

So far we have two protocols for the distribution of keys: the Diffie–Hellman key
exchange which uses the group U (Zp ) = F× p and Elliptic Curve key exchange
which employs the group E(Fp ).
The elliptic curve group E(Fp ) does seem more complicated than the group
F×p . The elements of E(Fp ) require two elements of Fp for their description,
and the group operation in E(Fp ) involves more steps than the simple modulo p
multiplication in F×p . What is the major advantage in choosing the ECDEK over the
DHDEK?
To attack the ECKEP by solving the ECDLP, one is limited to the algorithms
that solve the general DLP, namely, NAIVE_DLP and BSGS, which run in time

O( p log2 (p)) at best.
The Index Calculus method, with its faster subexponential running time, is
effective in attacking the DHKEP but can only be applied to the DHDLP; currently
there is no analog of the Index Calculus method for the ECDLP.
290 13 Elliptic Curves in Cryptography

So, the fastest known algorithm for solving the ECDLP runs in time

O( p log2 (p)); faster methods cannot be applied to the ECDLP.
For more discussion of this matter, see [33, Chapter 6, Section 4.5] and [7,
Chapter V, Section V.6].
Despite the advantage of ECKEP over DHKEP, care must be taken when
selecting an elliptic curve group.

13.5.2 What Elliptic Curves to Avoid

Let E(Fq ) be an elliptic curve over Fq , q = pm , p prime, m ≥ 1. By Hasse’s


theorem (Theorem 13.4.10),
√ √
q + 1 − 2 q ≤ |E(Fq )| ≤ q + 1 + 2 q,

thus

|E(Fq )| ≤ q + 1.

The integer t for which

|E(Fq )| + t = q + 1

is the trace of E(Fq ) at q.


Let P be a point of E(Fq ) of order n ≥ 1. Assume that gcd(n, q) = 1 and let
P  denote the cyclic subgroup of E(Fq ) generated by P ; we have n = |P |. Let
Q ∈ P . We want to solve the ECDLP

xP = Q. (13.6)

The ECDLP (13.6) can be solved using MOV attack, which is named for its authors
Menezes et al. [38]. We give a brief outline of the MOV attack below. For complete
details, see [63, Chapter 5, Section 5.3] and [7, Chapter V, Section V.2].

The MOV Attack

Let Fq denote an algebraic closure of Fq . Let n ≥ 1, and let

E(n) = {P ∈ E(Fq ) : the order of P in E(Fq ) divides n}.


13.5 The Elliptic Curve Key Exchange Protocol 291

Using the formula




Fq = Fq i ,
i=1

one sees that there exists a smallest positive integer m for which

E(n) ⊆ E(Fq m ).

Thus, by Washington [63, Corollary 3.11], Fq m contains the group μn of the nth
roots of unity, i.e., Fq m contains all of the roots of the equation x n − 1. Since μn is
a subgroup of F× qm ,

q m ≡ 1 (mod n),

and thus m is a multiple of the order of (q mod n) in U (Zn ).


Once the value of m has been determined, the MOV attack solves the ECDLP by
solving the DLP in F× q m . As we have indicated, there is an Index Calculus algorithm
for solving the DLP in F× pm [20].
If m is relatively small, then solving the DLP in F× pm is a much easier problem
than solving (13.6).
One case where m is small occurs when the elliptic curve is supersingular.

Supersingular Curves

An elliptic curve E(Fq ) is supersingular if p | t, where t is the trace of the curve.


If q = p and p ≥ 5, then E(Fp ) is supersingular if and only if t = 0 (use Hasse’s
theorem).
For example, the elliptic curve E(F151 ) defined by

y 2 = x 3 + 2x

is supersingular, |E(F151 )| = 152 and t = 0.


Let E(Fp ), p ≥ 5, be a supersingular elliptic curve, and suppose that P is a point
of E(Fp ) of order n. Then n divides p + 1 = |E(Fp )| and so, gcd(n, p) = 1.
Now, by Washington [63, Proposition 5.3], E(n) ⊆ E(Fp2 ) and so m = 1 or 2.
So, under the MOV attack, solving the ECDLP in P  is no harder than solving
the DLP in F× p2
.
Thus for use in ECKEP, an elliptic curve E(Fp ), p ≥ 5, should not be
supersingular, else we lose the main advantage of ECKEP over DHKEP: there are
no Index Calculus methods for ECDLP.
292 13 Elliptic Curves in Cryptography

Anomalous Curves

To assure that the MOV attack cannot be employed, one strategy is to choose curves
that are anomalous, i.e., satisfy

|E(Fq )| = q,

where q = pm (that is, the trace t = 1).


If E(Fq ) is anomalous and P ∈ E(Fq ) is a point of order n, then n | q. Hence
there is no m ≥ 1 for which

q m ≡ 1 mod n.

Consequently, there is no embedding E(n) ⊆ E(Fq m ), and thus the MOV attack
cannot be used to solve the ECDLP.
However, in the case that E(Fp ) is an anomalous curve for p prime, an algorithm
has been found that solves the ECDLP in polynomial time. The algorithm uses the
field of p-adic rationals, Qp ; see [63, Chapter 5, Section 5.4] and [7, Chapter V,
Section V.3]. Thus anomalous curves must also be avoided when choosing an elliptic
curve for ECKEP.
Example 13.5.7 Let E(F43 ) be the elliptic curve defined by

y 2 = x 3 + 10x + 36

over F43 . Then E(F43 ) has exactly 43 points, and hence E(F43 ) is anomalous.


Here is a GAP program that will compute the 42 non-trivial points of E(F43 ).
q:=List([0..42],i->(iˆ 3+10*i+36) mod 43);
for j in [1..43] do
if Legendre(q[j],43)=1 then
Print("(",j-1,",",RootMod(q[j],43),")",",");
Print("(",j-1,",",-1*RootMod(q[j],43) mod 43,")",",");
fi;
od;

13.5.3 Examples of Good Curves

As we have seen, in selecting a curve to use in the ECKEP, we need to avoid


supersingular and anomalous curves.
In this section we give examples of elliptic curves over Fp that are appropriate
for use in the ECKEP. These curves are given in the form

y 2 = x 3 + ax + b
13.5 The Elliptic Curve Key Exchange Protocol 293

and can be found in [7, Appendix A]. Our computations are done using GAP.
Example 13.5.8 We take

p = 2130 + 169 = 1361129467683753853853498429727072845993,

which is a 131-bit prime. We let

a = 3,
b = 1043498151013573141076033119958062900890,

which defines the elliptic curve

y 2 = x 3 + 3x + 1043498151013573141076033119958062900890,

with group E(Fp ).


Now,

|E(Fp )| = 1361129467683753853808807784495688874237,

which is a 130-bit prime.


The trace of E(Fp ) is

t = p + 1 − |E(Fp )| = 44690645231383971757,

and thus E(Fp ) is not supersingular.


If P is any non-trivial point in E(Fp ), then it has order

n = |E(Fp )| = 1361129467683753853808807784495688874237,

thus

P  = E(Fp ).

Now, gcd(n, p) = 1 and the order of (p mod n) in U (Zn ) is

l = 680564733841876926904403892247844437118.

Thus, when applying the MOV attack, the smallest integer m for which

E(n) ⊆ E(Fpm )

is at least

680564733841876926904403892247844437118.
294 13 Elliptic Curves in Cryptography

Thus, it is infeasible to use the MOV attack to solve the ECDLP in E(Fp )
The curve E(Fp ) is not anomalous (since t = 1). So the approach in the
anomalous case will not work, either.
The best we can do to solve this ECDLP is to use √ the Baby-Step/Giant-Step
(BSGS) algorithm, which will solve the DLP in time O( n log2 (n)).
To use BSGS to solve the DLP, our computer would have to perform ≈ 265 ·
130 = 272 basic operations, which is beyond the number of operations that a
computer can do in a reasonable amount of time.
We can use GAP to compute an explicit example of a hard DLP in E(Fp ). For
instance, one can use GAP to show that the point

P = (0, 1314511337629110386987830837960619486151)

is in E(Fp ) and thus generates E(Fp ). Moreover,

Q = (2, 466980356246987135141837142580539177759) ∈ E(Fp ).

Thus the ECDLP

xP = Q

is hard.
The relevant GAP code is
p:=2ˆ 130+169; #a prime
a:=3;
b:=1043498151013573141076033119958062900890; #curve parameters
n:=1361129467683753853808807784495688874237; #the order of the
elliptic curve group (a prime)
t:=p+1-n; #the trace
l:=OrderMod(p,n); #the order of p modulo n
x1:=0;
s1:=(x1ˆ 3+a*x1+b) mod p;
y1:=RootMod(s1,p); #P=(x1,y1) generates the elliptic curve
group
x2:=2;
s2:=(x2ˆ 3+a*x2+b) mod p;
y2:=RootMod(s2,p); #Q=(x2,y2) is the second point

Example 13.5.9 We take

p = 2160 + 7 = 1461501637330902918203684832716283019655932542983,

which is a 161-bit prime.


We let

a = 10,
b = 1343632762150092499701637438970764818528075565078,
13.5 The Elliptic Curve Key Exchange Protocol 295

which defines the elliptic curve

y 2 = x 3 + 10x + 1343632762150092499701637438970764818528075565078,

with group E(Fp ).


Now,

|E(Fp )| = 1461501637330902918203683518218126812711137002561,

which is a 160-bit prime.


The trace of E(Fp ) is

t = p + 1 − |E(Fp )| = 1314498156206944795540423,

and thus E(Fp ) is not supersingular.


If P is any non-trivial point in E(Fp ), then it has order

n = |E(Fp )| = 1461501637330902918203683518218126812711137002561,

thus

P  = E(Fp ).

Now, gcd(n, p) = 1 and the order of (p mod n) in U (Zn ) is

l = 730750818665451459101841759109063406355568501280.

Thus, when applying the MOV attack, the smallest integer m for which

E(n) ⊆ E(Fpm )

is at least

730750818665451459101841759109063406355568501280,

and so, it is infeasible to use the MOV attack to solve the ECDLP in E(Fp ).
The curve E(Fp ) is not anomalous (since t = 1). So the approach in the
anomalous case will not work, either.
The best we can do to solve √ this ECDLP is to use the BSGS algorithm, which
will solve the DLP in time O( n log2 (n)).
Fortunately (from a security standpoint), to use BSGS to solve the DLP, our
computer would have to perform ≈ 280 · 160 ≈ 272 basic operations, which is
beyond the number of operations that a computer can do in a reasonable amount of
time.
We conclude that E(Fp ) is a good curve for an ECKEP application.
296 13 Elliptic Curves in Cryptography

13.6 Exercises

1. Show that let y 2 = x 3 − 3x − 1 define an elliptic curve over Q. Does this


equation define an elliptic curve over F3 ?
2. Determine whether y 2 = x 2 − 13 x + 272
defines an elliptic curve over Q.
3. Let E(Q) be the elliptic curve group given by the equation y 2 = x 3 + 17.
(a) Show that P = (−1, 4) and Q = (4, 9) are elements of E(Q).
(b) Compute P + Q and 2P in E(Q).
4. Let E(Q) be the elliptic curve group defined by the equation y 2 = x 3 − 43x +
166. Show that P = (3, 8) generates a subgroup of order 7 in E(Q).
5. Let E(F5 ) be the elliptic curve group defined by the equation y 2 = x 3 + 2.
(a) Show that E(F5 ) is an abelian group with 2 ≤ |E(F5 )| ≤ 11.
(b) Construct all of the elements of E(F5 ).
(c) Find a group isomorphic to E(F5 ) of the form Zm × Zn , m | n.
6. Find the smallest prime p so that the elliptic curve group E(Fp ) has at least
100 elements. Note: once such a prime p is found, you should show that there
exists an elliptic curve over Fp .
7. Alice and Bob are using the ECKEP with the group E(F5 ) of Example 13.4.11.
Alice and Bob share the point P = (0, 1) of order 8. Alice randomly chooses
x = 3 and Bob randomly chooses y = 7. Compute the point shared by Alice
and Bob.
8. Let E(F5 ) be the group of Example 13.4.11. Let P = (0, 1), Q = (3, 0). Solve
the DLP mP = Q.
9. Let E(F5 ) be the elliptic curve group of Example 13.5.2.
(a) Compute all of the elements of E(F5 ).
(b) Prove that E(F5 ) ∼
= Z9 and that E(F5 ) is generated by (0, 1).
10. Let y 2 = x 3 + 547x + 21 be the elliptic curve over F557 . Write a GAP program
to compute all of the non-trivial points of E(F557 ).
11. Consider the ElGamal public key cryptosystem of Section 9.4. Devise an
elliptic curve ElGamal public key cryptosystem with a cyclic subgroup of
E(Fp ) in place of U (Zl ), l prime.
As a first step, choose an elliptic curve group E(Fp ) and a point P ∈ E(Fp )
of order r. The subgroup P  plays the role of U (Fl ); P plays the role of a
primitive root modulo l.
Chapter 14
Singular Curves

Let K be a field not of characteristic 2. Let K × denote the multiplicative group of


non-zero elements of K. Let

y 2 = x 3 + ax + b

be a singular curve and assume that the cubic x 3 + ax + b has a double root. Then
the curve does not define an elliptic curve group. The set of non-singular points,
Ens (K), however, is still a group under the point addition of Proposition 13.4.1.
In this chapter, we study the structure of the group Ens (K). The group Ens (K) is
certainly of interest mathematically and may yet yield applications to cryptography,
for instance, see [63, Section 2.9].

14.1 The Group Ens (K)

To define a group structure on Ens (K), we begin by rewriting the singular curve that
defines Ens (K).
Proposition 14.1.1 Let K be a field of characteristic not 2 or 3. Let y 2 = x 3 +
ax + b be a singular curve in which x 3 + ax + b has a double root in K. Then the
curve can be written as

y 2 = x 2 (x + c)

for some c ∈ K × .
Proof Since y 2 = x 3 + ax + b is singular, it does not define an elliptic curve over
K. Since x 3 + ax + b has a double root in K, the cubic is reducible over K. Thus
x 3 + ax + b = (x − r)q(x), r ∈ K, where q(x) is a quadratic over K. Either r is the

© Springer Nature Switzerland AG 2022 297


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7_14
298 14 Singular Curves

double root, so that x 3 + ax + b = (x − r)2 (x − s) for some s ∈ K, s = r, or the


quadratic factor has the double root as its roots. But then q(x) must be reducible,
hence q(x) = (x − s)2 , s ∈ K, s = r. In any case, if x 3 + ax + b has a double root,
then

x 3 + ax + b = (x − r)2 (x − s)

for r, s ∈ K, r = s.
Let x  = x − r. Then we obtain the curve

y 2 = (x  )2 (x  + r − s),

or

y 2 = x 2 (x + c),

where c = r − s = 0.


The only singular point of the curve y 2 = x 2 (x + c) is (0, 0).
Let Ens (K) be the collection of non-singular points of the curve y 2 = x 2 (x + c),
c = 0.
Proposition 14.1.2 (Binary Operation on Ens (K)) Let K be a field not of char-
acteristic 2 or 3, and let Ens (K) denote the set of non-singular points of the curve
y 2 = x 2 (x + c), c = 0. Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) be points of Ens (K). There
exists a binary operation on Ens (K) defined as follows. (Note: these formulas are
not the same as those given in Proposition 13.4.1.)
(i) If P1 = P2 , x1 = x2 , then

P1 + P2 = P3 = (x3 , y3 ),

where

x3 = m2 − c − x1 − x2 and y3 = m(x1 − x3 ) − y1 .

y2 − y1
with m = .
x2 − x1
(ii) If P1 =
 P2 , x1 = x2 , y1 =
 y2 , then

P1 + P2 = O.

(iii) If P1 = P2 , y1 = y2 = 0, then

P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),
14.1 The Group Ens (K) 299

where

x3 = m2 − c − 2x1 and y3 = m(x1 − x3 ) − y1 .

3x12 + 2cx1
with m = .
2y1
(iv) If P1 = P2 , y1 = y2 = 0, then

P1 + P2 = P1 + P1 = 2P1 = O.

(v) If P2 = O, then

P1 + O = P1 .

Proof See Section 14.5, Exercise 1. 



Now, under the binary operation of Proposition 14.1.2, Ens (K) is an additive
abelian group.
Let Ens (K) be the group of non-singular points on the curve y 2 = x 2 (x + c),
c = 0. Let β 2 = c for some β ∈ K. Let K(β) be the simple algebraic extension
field of K.
Proposition 14.1.3 If β ∈ K, then the subset

Jc = {u + βv : u, v ∈ K, u2 − cv 2 = 1} ⊆ K(β)

is a subgroup of K(β)× under the multiplication of K(β).


Proof See Section 14.5, Exercise 4.


We have the following characterization of Ens (K) due to L. C. Washington [63,
Theorem 2.30].
Theorem 14.1.4 Let K be a field not of characteristic 2 or 3, and let Ens (K) be the
group of non-singular points of y 2 = x 2 (x + c), c = 0. Let β 2 = c for β ∈ K.
(i) If β ∈ K, then Ens (K) ∼= K ×.
(ii) If β ∈ K, then Ens (K) ∼
= Jc .
Proof
For (i): there is a group isomorphism

ψ : Ens (K) → K ×

defined by ψ(O) = 1, and

y + βx
ψ(x, y) = .
y − βx
300 14 Singular Curves

The inverse of ψ is given as

ψ −1 : K × → Ens (K),

with ψ −1 (1) = O, and ψ −1 (r) = (x, y), where

4cr 4βcr(r + 1)
x= , y= ,
(r − 1)2 (r − 1)3

for r = 1. For details, see [63, Theorem 2.30(i)].


For (ii): in this case, there is a group isomorphism

ψ : Ens (K) → Jc ≤ K(β)×

defined by ψ(O) = 1, and ψ(x, y) = u + βv, where

y 2 + cx 2 2xy
u= , y= .
y 2 − cx 2 y2 − cx 2

The inverse of ψ is given as

ψ −1 : Jc → Ens (K),

with ψ −1 (1) = O, ψ −1 (−1) = (−c, 0), and ψ −1 (u + βv) = (x, y), where
 2  
u+1 u+1
x= − c, y= x,
v v

for u + βv = ±1. For details, see [63, Theorem 2.30(ii)].




Example 14.1.5 Let K = F7 , and let Ens (F7 ) be the group defined by y 2 = x 2 (x +
2),

Ens (F7 ) = {O, (2, 4), (2, 3), (5, 0), (6, 1), (6, 6)}.

We have 32 ≡ 2 (mod 7), thus β = 3 ∈ F7 . Thus Theorem 14.1.4(i) applies to


give an isomorphism

ψ : Ens (F7 ) → F×
7

defined by ψ(O) = 1, and

y + 3x
ψ(x, y) = .
y − 3x
14.2 The DLP in Ens (K) 301

We have

ψ(O) = 1, ψ(2, 4) = 2, ψ(2, 3) = 4,

ψ(5, 0) = 6, ψ(6, 1) = 3, ψ(6, 6) = 5.

14.2 The DLP in Ens (K)

Theorem 14.1.4(i) shows that there is no great advantage in using Ens (K) over K ×
in a cryptographic application: if β ∈ K, then solving the DLP in Ens (K) is no
harder than solving the DLP in K × .
Let Ens (K) be defined by y 2 = x 2 (x + c), where c = 0. Suppose that β 2 = c
for some β ∈ K. Let P ∈ Ens (K), and let P  be the cyclic subgroup of Ens (K)
generated by P . Let Q ∈ P . We seek to solve the DLP

mP = Q. (14.1)

But this is easy if we apply the isomorphism ψ of Theorem 14.1.4(i) to both sides
of (14.1) to obtain

ψ(P )m = ψ(Q),

which is a DLP in K × and is usually quite simple to solve.


Example 14.2.1 Let K = R, and let Ens (R) be defined by y 2 = x 2 (x + 4). Then
c = 4 and β = 2. By Theorem 14.1.4(i), there is a group isomorphism

ψ : Ens (R) → R× ,

given as

y + 2x
(x, y) → .
y − 2x

It is known that P = (32, 192) ∈ Ens (R) and that Q = ( 512


961 , 29791 ) ∈ P . We
33792

want to solve the DLP


 
512 33792
m(32, 192) = , . (14.2)
961 29791

Now,

(192 + (2 · 32))
ψ(32, 192) = = 2,
(192 − (2 · 32))
302 14 Singular Curves

and

  33792
+ 2· 512
512 33792 29791 961
ψ , = = 32.
961 29791 33792
− 2· 512
29791 961

Thus the DLP (14.2) becomes the DLP

2m = 32,

which is easily solved by m = 5. Consequently,

5P = Q.


Here is an example involving finite fields.
Example 14.2.2 Let K = F7 , and let Ens (F7 ) be defined by y 2 = x 2 (x + 2). As in
Example 14.1.5,

Ens (F7 ) = {O, (2, 4), (2, 3), (5, 0), (6, 1), (6, 6)},

β = 3 ∈ F7 . Thus Theorem 14.1.4(i) applies to give an isomorphism

ψ : Ens (K) → F×
7

defined by ψ(O) = 1, and

y + 3x
ψ(x, y) = .
y − 3x

We want to solve the DLP

m(6, 1) = (5, 0). (14.3)

Now,

ψ(6, 1) = 3, ψ(5, 0) = 6.

Thus the DLP (14.3) becomes the DLP

3m = 6,

which is solved by m = 3. Consequently,

3(6, 1) = (5, 0).




14.3 The Group Gc (K) 303

14.3 The Group Gc (K)

Let K be a field of characteristic not 2, and let c ∈ K × . In this section we show that
the points on the curve

x 2 − cy 2 = 1

form an additive abelian group.


Theorem 14.3.1 Let K be a field of characteristic not 2, and let c ∈ K × . Let

Gc (K) = {(x, y) ∈ K × K : x 2 − cy 2 = 1}.

Then Gc (K) is an abelian group under the binary operation

+ : Gc (K) × Gc (K) → Gc (K)

defined as follows. For P1 = (x1 , y1 ), P2 = (x2 , y2 ) ∈ Gc (K),

P1 + P2 = P3 = (x3 , y3 ),

where x3 = x1 x2 + cy1 y2 and y3 = x1 y2 + y1 x2 .


Proof See Section 14.5, Exercise 6.


Example 14.3.2 Let K = R, c = −1. Then

G−1 (R) = {(x, y) ∈ R × R : x 2 + y 2 = 1}

with binary operation

P1 + P2 = P3 = (x3 , y3 ),

where x3 = x1 x2 − y1 y2 and y3 = x1 y2 + y1 x2 . For instance, let P1 = ( 45 , 35 ),


√ √
P2 = ( 2 2
2 , 2 ). Then
√ √ √
4 2 3 2 2
x3 = · − · = ,
5 2 5 2 10
√ √ √
4 2 2 3 7 2
y3 = · + · = .
5 2 2 5 10
304 14 Singular Curves

Fig. 14.1 Addition of points


P1 + P2 = P3 and
computation of −P1 in the
circle group

Thus
  √ √  √ √ 
4 3 2 2 2 7 2
, + , = , .
5 5 2 2 10 10

Also, −( 45 , 35 ) = ( 45 , − 35 ). The group G−1 (R) is the circle group. See Figure 14.1.


A point P ∈ G−1 (R) in the circle group can be given as

P = (cos(α), sin(α)),

for some α, 0 ≤ α < 2π .


Now, the sum of points

P1 + P2 = (cos(α), sin(α)) + (cos(β), sin(β)) = P3

is given by the formulas

x3 = cos(α)(cos(β) − sin(α) sin(β),

y3 = cos(α) sin(β) + sin(α) cos(β),

for some α, β, 0 ≤ α, β < 2π .


In Figure 14.1, if m( P1 OQ) = α and m( P2 OQ) = β, then m( P3 OQ) =
α + β. Consequently,

cos(α + β) = cos(α) cos(β) − sin(α) sin(β),

sin(α + β) = cos(α) sin(β) + sin(α) cos(β),

which are familiar trigonometric identities.


14.3 The Group Gc (K) 305

Here are some examples of the group Gc (K) over finite fields.
Example 14.3.3 Let K = F11 , c = 3, so that

G3 (F11 ) = {(x, y) ∈ F11 × F11 : x 2 − 3y 2 = 1}

with binary operation

P1 + P2 = P3 = (x3 , y3 ),

where x3 = x1 x2 + 3y1 y2 and y3 = x1 y2 + y1 x2 . We can compute all of the points


of G3 (F11 ) using the following table.

y 3y 2 + 1 x Points
0 1 ±1 (1, 0), (10, 0)
1 4 ±2 (2, 1), (9, 1)
2 2 None None
3 6 None None
4 5 ±4 (4, 4), (7, 4)
5 10 None None
6 10 None None
7 5 ±4 (4, 7), (7, 7)
8 6 None None
9 2 None None
10 4 ±2 (2, 10), (9, 10)

Thus, G3 (F11 )

= {(1, 0), (10, 0), (2, 1), (9, 1), (4, 4), (7, 4), (4, 7), (7, 7), (2, 10), (9, 10)}.

We have (2, 1) + (4, 7) = (7, 7) and 2(4, 4) = (9, 10). Note that 113
= 1, thus 3
∼ × ∼
is a quadratic residue modulo 11. In fact, G3 (F11 ) = F11 = Z10 , as we shall soon
see.


Example 14.3.4 Let K = F7 , c = 3, so that

G3 (F7 ) = {(x, y) ∈ F7 × F7 : x 2 − 3y 2 = 1}

with binary operation

P1 + P2 = P3 = (x3 , y3 ),
306 14 Singular Curves

where x3 = x1 x2 + 3y1 y2 and y3 = x1 y2 + y1 x2 . The points of G3 (F7 ) are given


by the following table.

y 3y 2 + 1 x Points
0 1 ±1 (1, 0), (6, 0)
1 4 ±2 (2, 1), (5, 1)
2 6 None None
3 0 0 (0, 3)
4 0 0 (0, 4)
5 6 None None
6 4 ±2 (2, 6), (5, 6)

Thus

G3 (F7 ) = {(1, 0), (6, 0), (2, 1), (5, 1), (0, 3), (0, 4), (2, 6), (5, 6)}.

Note that 3
7= −1, thus 3 is a not a quadratic residue modulo 7. In fact, G3 (F7 )
is isomorphic to the subgroup Z8 of F× ∼
49 = Z48 .


Remark 14.3.5 Let (x 2 − cy 2 − 1) denote the principal ideal of K[x, y] generated
by x 2 − cy 2 − 1. Then the quotient ring

H = K[x, y]/(x 2 − cy 2 − 1)

is a Hopf algebra over K. The group structure of Gc (K) is a consequence of the


Hopf algebra structure of H .
For details regarding Hopf algebras, the reader is referred to this author’s text
[60].



14.4 Ens (K) ∼


= Gc (K)

We now show that the group Gc (K) is essentially the same as the group Ens (K) of
non-singular points.
Proposition 14.4.1 Let K be a field of characteristic not 2 or 3. Let Ens (K) be the
collection of non-singular points of the curve y 2 = x 2 (x + c), c = 0. Let Gc (K) be
the group of points

Gc (K) = {(u, v) ∈ K × K : u2 − cv 2 = 1}.


14.4 Ens (K) ∼
= Gc (K) 307

Then there is an isomorphism of groups

θ : Ens (K) → Gc (K)

defined as
 
y 2 + cx 2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0), θ (−c, 0) = (−1, 0).
y − cx y − cx 2
2 2

Proof Let β 2 = c for some β ∈ K. Assume that β ∈ K. Then by Theo-


rem 14.1.4(i), there is an isomorphism of groups ψ : Ens (K) → K × defined by:
y + βx
ψ(x, y) = , ρ(O) = 1.
y − βx

By Pareigis [41, Section 1, p. 77], there is an isomorphism ρ : K × → Gc (K)


defined by:
 
t + t −1 t − t −1
ρ(t) = , .
2 2β
The composition

θ =ρ◦ψ

is an isomorphism θ : Ens (K) → Gc (K) with

θ (x, y) = (ρ ◦ ψ)(x, y)
= ρ(ψ(x, y))
 
y + βx

y − βx
    
1 y + βx y − βx 1 y + βx y − βx
= + , −
2 y − βx y + βx 2β y − βx y + βx
 2 
y + cx 2 2xy
= , 2 ,
y − cx y − cx 2
2 2

and

θ (O) = (ρ ◦ ψ)(O) = ρ(ψ(O)) = ρ(1) = (1, 0).

We next assume that β ∈ K. Let ψ : Ens (K) → Jc ,


   
y + βx y 2 + cx 2 2xy
(x, y) → = +β ,
y − βx y 2 − cx 2 y − cx 2
2
308 14 Singular Curves

y 2 −x 2
Fig. 14.2 The isomorphism θ : Ens (R) → G−1 (R), (x, y) → , 2xy
y 2 +x 2 y 2 +x 2

be the isomorphism of Theorem 14.1.4(ii). There is an isomorphism η : Jc →


Gc (K), given as η(u + βv) = (u, v). Thus the composition θ = η ◦ ψ is the
isomorphism Ens (K) → Gc (K), as claimed.


Example 14.4.2 Let K = R and c = −1, so that Ens (R) is the group of non-
singular points on the curve y 2 = x 2 (x − 1) and G−1 (R) is the circle group. The
group isomorphism

θ : Ens (R) → G−1 (R)

is defined as
 
y2 − x2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0).
y + x y + x2
2 2

For instance, θ (1, 0) = (−1, 0); see Figure 14.2.




Example 14.4.3 We take K = F7 , c = 3, so that Ens (F7 ) is the group of
non-singular points on the curve y 2 = x 2 (x + 3) and G3 (F7 ) is the group of
Example 14.3.4. We have

Ens (F7 ) = {(1, 2), (1, 5), (4, 0), (5, 2), (5, 5), (6, 3), (6, 4), O},

and

G3 (F7 ) = {(1, 0), (6, 0), (2, 1), (5, 1), (0, 3), (0, 4), (2, 6), (5, 6)}.
14.5 Exercises 309

The group isomorphism

θ : Ens (F7 ) → G3 (F7 )

is defined as
 
y 2 + 3x 2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0).
y − 3x y − 3x 2
2 2

We have

θ (O) = (1, 0), θ (1, 2) = (0, 4), θ (1, 5) = (0, 3), θ (4, 0) = (6, 0),

θ (5, 2) = (5, 1), θ (5, 5) = (5, 6), θ (6, 3) = (0, 4), θ (6, 4) = (0, 3).



14.5 Exercises

1. Verify that the formulas given in Proposition 14.1.2 define a binary operation
on Ens (K).
2. Compute the points of Ens (F7 ) defined by y 2 = x 2 (x + 2).
3. Show that Ens (F7 ) is a group under the binary operation of Proposition 14.1.2.
4. Prove Proposition 14.1.3.
5. Let p be an odd prime, and let Ens (Fp ) be the group of non-singular points
defined by y 2 = x 2 (x + c), c = 0. Suppose that c is a quadratic residue modulo
p, i.e., β 2 = c for some β ∈ F× p.

(a) Show that the map ψ : Ens (Fp ) → F× p defined by f (x, y) = y−βx ,
y+βx

f (O) = 1, is an isomorphism of groups.


(b) Let Ens (F7 ) be the group given in Exercise 3. Use part (a) to solve the DLP

m(6, 1) = (5, 0)

in Ens (F7 ) by taking images under ψ, and then solving the DLP

ψ(6, 1)m = ψ(5, 0)

in F×
7.
6. Prove Theorem 14.3.1.
7. Show that G−1 (Q) (circle group over Q) has an infinite number of points.
310 14 Singular Curves

8. Let G−1 (R) be the circle group.

(a) Let P = 5 12
13 , 13 . Compute 2P and −P .
√ √
(b) Compute (−1, 0) + 2
2
, − 2
2
.

9. Compute the points of G5 (F7 ).


10. Compute the points of G1 (K), where K is the field F9 from Example 7.3.14.
11. Let Ens (Q) be the group of non-singular points on the curve y 2 = x 2 (x − 4),
and let θ : Ens (Q) → G−4 (Q) be the isomorphism defined as
 
y 2 − 4x 2 2xy
θ (x, y) = , , θ (O) = (1, 0).
y 2 + 4x 2 y 2 + 4x 2

(a) Compute θ (5, −5).


(b) Find the preimage of 0, 12 ∈ G−4 (Q) under θ .

12. Let θ : Ens (R) → G−1 (R) be the isomorphism of groups of Example 14.4.2.
(a) Prove that J = {O, ( 43 , √
4
), (( 43 , − √
4
)} is a subgroup of Ens (R).
3 3 3 3
(b) Find the image of J under θ .
13. Is Gc (Fp ) a good choice for the group in a Diffie–Hellman key exchange
protocol? Why or why not?
14. Let E(R) be the elliptic curve group given by y 2 = (x)(x + )(x + c) for
 > 0, c > 0. Let Ens (R) be the group of non-singular points on the curve
y 2 = x 2 (x + c). Let mP = Q be a DLP in E(R). Show that a solution to
this DLP can be approximated by a solution to some other DLP in Ens (R) and
hence in R× .
References

1. W. Alexi, B. Chor, O. Goldreich, C.P. Schorr, RSA/Rabin bits are 12 + (1/poly(log N )) secure,
in IEEE 25th Symposium on Foundations of Computer Science (1984), pp. 449–457
2. J.-P. Allouche, J.O. Shallit, Automatic Sequences (Cambridge University Press, Cambridge,
2003)
3. S. Baase, A. Van Gelder, Computer Algorithms: Introduction to Design and Analysis (Addison-
Wesley, Reading, 2000)
4. G. Baumslag, B. Fine, M. Kreuzer, G. Rosenberger, A Course in Mathematical Cryptography
(De Gruyter, Berlin, 2015)
5. E.R. Berlekamp, Factoring polynomials over large finite fields. Math. Comput. 24, 713–735
(1970)
6. J. Berstel, C. Reutenauer, Noncommutative Rational Series with Applications (Cambridge
University Press, Cambridge, 2011)
7. I. Blake, G. Seroussi, N. Smart, Elliptic Curves in Cryptography (Cambridge University Press,
Cambridge, 1999)
8. L. Blum, M. Blum, M. Shub, A simple unpredictable pseudo-random number generator. Siam.
J. Comput. 15(2), 364–383 (1886)
9. E.R. Canfield, P. Erdős, C. Pomerance, On a problem of Oppenheim concerning “factorisatio
numerorum. J. Num. Theory 17(1), 1–28 (1983)
10. D. Chaum, E. van Heijst, B. Pfitzmann, Cryptographically strong undeniable signatures,
unconditionally secure for the signer, in Advances in Cryptology CRYPTO’91. Lecture Notes
in Computer Science, vol. 576 (Springer, Berlin, 1992), pp. 470–484
11. L.N. Childs, Cryptology and Error Correction: An Algebraic Introduction and Real-World
Applications Springer Undergraduate Texts in Mathematics and Technology (Springer, Cham,
2019)
12. L.N. Childs, Taming Wild Extensions: Hopf Algebras and Local Galois Module Theory. AMS:
Mathematical Surveys and Monographs, vol. 80 (American Mathematical Society, Providence,
2000)
13. G. Christol, Ensembles presque périodiques k-reconnaissables. Theor. Comput. Sci. 9(1), 141–
145 (1979)
14. G. Christol, T. Kamae, M.M. France, G. Rauzy, Suites algébriques, automates et substitutions.
Bull. Soc. Math. France 108, 401–419 (1980)
15. A. Cobham, On the base-dependence of sets of numbers recognizable by finite automata. Math.
Syst. Theory 3, 186–192 (1969)
16. A. Cobham, Uniform tag sequences. Math. Syst. Theory 6, 164–192 (1972)

© Springer Nature Switzerland AG 2022 311


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7
312 References

17. M. Coons, P. Vrbik, An irrationality measure of regular paperfolding numbers. J. Int. Seq. 15,
1–10 (2012)
18. D. Coppersmith, H. Krawczyk, Y. Mansour, The Shrinking Generator (IBM T. J. Watson
Research Center, New York, 1998)
19. M.M. Eisen, C.A. Eisen, Probability and its Applications (Quantum, New York, 1975)
20. A. Godušová, Number field sieve for discrete logarithm, Masters Thesis, Charles University,
Prague (2015)
21. C. Greither, B. Pareigis, Hopf Galois theory for separable field extensions. J. Algebra 106,
239–258 (1987)
22. R. Haggenmüller, B. Pareigis, Hopf algebra forms on the multiplicative group and other groups.
Manuscripta Math. 55, 121–135 (1986)
23. D.W. Hardy, C.L. Walker, Applied Algebra: Codes, Ciphers, and Discrete Algorithms (Pearson,
New Jersey, 2003)
24. K. Hoffman, R. Kunze, Linear Algebra, 2e (Prentice-Hall, New Jersey, 1971)
25. J. Hoffstein, J. Pipher, J.H. Silverman, An Introduction to Mathematical Cryptography.
Undergraduate Texts in Mathematics Book Series (Springer, New York, 2008)
26. J.E. Hopcroft, J.D. Ulman, Introduction to Automata Theory, Languages, and Computation
(Addison-Wesley, Reading, 1979)
27. J.E. Hopcroft, R. Motwani, J.D. Ulman, Introduction to Automata Theory, Languages, and
Computation, 3e (Addison-Wesley, Reading, 2007)
28. K. Ireland, M. Rosen, A Classical Introduction to Modern Number Theory, Graduate Text in
Mathematics, vol. 84, 2nd edn. (Springer, New York, 1990)
29. L. Işik, A. Winterhof, Maximum-order complexity and correlation measures (2017).
arXiv:1703.09151
30. C.J.A. Jansen, The maximum order complexity of sequence ensembles, in Advances in
Cryptology-EUROCRYPT ’91, ed. by D.W. Davies. Lecture Notes in Computer Science, vol.
547 (Springer, Berlin, 1991), pp. 153–159
31. C.J.A. Jansen, D.E. Boekee, The shortest feedback shift register that can generate a given
sequence, in Advances in Cryptology-CRYPTO’89, ed. by G. Brassard, Lecture Notes in
Computer. Science (Springer, Berlin, 1990), pp. 435, 90–99
32. N. Koblitz, A Course in Number Theory and Cryptography. Graduate Text in Mathematics,
vol. 114 (Springer, New York, 1987)
33. N. Koblitz, Algebraic Aspects of Cryptography (Springer, Berlin, 1998)
34. A. Koch, T. Kohl, P. Truman, R. Underwood, The structure of hopf algebras acting on dihedral
extensions, in Advances in Algebra. SRAC 2017, ed. by J. Feldvoss, L. Grimley, D. Lewis,
A. Pavelescu, C. Pillen. Springer Proceedings in Mathematics & Statistics, vol 277 (Springer,
Cham, 2019)
35. A.G. Konheim, Cryptography: A Primer (Wiley, New York, 1981)
36. S. Lang, Algebra, 2nd edn. (Addison-Wesley, Reading, 1984)
37. W. Mao, Modern Cryptography (Prentice-Hall, New Jersey, 2004)
38. A.J. Menezes, T. Okamoto, S.A. Vanstone, Reducing elliptic curve logarithms to logarithms in
a finite field. IEEE Trans. Inf. Theory 39(5), 1639–1646 (1993)
39. A.J. Menezes, P.C. van Oorschot, S.A.Vanstone, Handbook of Applied Cryptography (CRC
Press, Boca Raton, 1997)
40. L. Mérai, A. Winterhof, On the N th linear complexity of automatic sequences. J. Num. Theory
187, 415–429 (2018)
41. B. Pareigis, Forms of Hopf Algebras and Galois Theory. Topics in Algebra, Banach Center
Publications, vol. 26, Part 1 (PWN Polish Scientific Publishers, Warsaw, 1990)
42. J.M. Pollard, Theorems on factorizations and primality testing. Proc. Cambridge. Phil. Soc. 76,
521–528 (1974)
43. J.M. Pollard, A Monte Carlo method for factorization. Nor. Tid. Inform 15, 331–334 (1975)
44. C. Pomerance, A tale of two sieves. Not. Am. Math. Soc. 43(12), 1473–1485 (1996)
45. P. Popoli, On the maximum order complexity of the Thue-Morse and Rudin-Shapiro sequences
along polynomial values (2020). arXiv: 2011.03457
References 313

46. M. Rigo, Formal Languages, Automata and Numeration Systems 1 (Wiley, New Jersey, 2014)
47. K. Rosen, Elementary Number Theory and Its Applications, 6th edn. (Addison-Wesley, Boston,
2011)
48. J.J. Rotman, Advanced Modern Algebra (Prentice-Hall, New Jersey, 2002)
49. E. Rowland, What is. . . an automatic sequence? Not. Am. Math. Soc. 62, 274–276 (2015)
50. S. Rubinstein-Salzedo, Cryptography. Springer Undergraduate Mathematics Series (Springer,
Cham, 2018)
51. W. Rudin, Principles of Mathematical Analysis, 3rd edn. (McGraw-Hill, New York, 1976)
52. I.R. Shafarevich, Basic Algebraic Geometry (Springer, New York, 1974)
53. D. Shanks, Class number, a theory of factorization and genera, in Proceedings of Symposium of
Pure Mathematics, vol. 20 (American Mathematical Society, Providence, 1971), pp. 415–440
54. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423
(1948)
55. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656
(1948)
56. N.P. Smart, Cryptography Made Simple (Springer, Cham, 2016)
57. Z. Sun, A. Winterhof, On the maximum order complexity of the Thue-Morse and Rudin-
Shapiro sequence (2019). arXiv:1910.13723
58. Z. Sun, A. Winterhof, On the maximum order complexity of subsequences of the Thue-Morse
and Rudin-Shapiro sequence along squares. Int. J. Comput. Math.: Comput. Syst. Theory 4(1),
30–36 (2019)
59. J.T. Talbot, D. Welsh, Complexity and Cryptography: An Introduction (Cambridge University
Press, Cambridge, 2006)
60. R.G. Underwood, Fundamentals of Hopf Algebras. Universitext (Springer, Cham, 2015)
61. R.G. Underwood, Fundamentals of Modern Algebra: A Global Perspective (World Scientific,
New Jersey, 2016)
62. R. Underwood, Hopf forms and Hopf-Galois theory. www.scm.keele.ac.uk/staff/p_truman/
ConferenceArchive/2020Omaha/index.html
63. L.C. Washington, Elliptic Curves, Number Theory and Cryptography (Chapman/Hall/CRC,
Boca Raton, 2003)
64. J. Winter, Erratum to various proofs of Christol’s theorem. www.mimuw.edu.pl/~jwinter/
articles/christol.pdf
65. S. Woltmann, www.happycoders.eu/algorithms/merge-sort/#Merge_Sort_Time_Complexity
Index

A Birthday paradox, 18
(a mod n), 82 Bit generator, 164
Abelian group, 75 Blum–Blum–Shub, 249
Abstract probability space, 11 Blum–Micali, 243
Advanced Encryption Standard (AES), 168 pseudorandom, 236
Affine cipher, 143 Bit-wise addition, 160
Algebra, 11 Block cipher, 147, 164
Algebraic iterated, 165
closure, 128 Blum–Blum–Shub (BBS), 248
element, 128 bit generator, 249
extension, 128 sequence, 248
Algebraically closed, 128 Blum–Micali (BM), 242
Algorithm, 53 bit generator, 243
exponential time, 60 sequence, 242
polynomial time, 54 Blum prime, 115
probabilistic polynomial time, 69 Bounded-error probabilistic polynomial time
Alphabet, 74 (BPP), 63
closure, 74 Bounded-error probabilistic polynomial time
American Standard Code for Information decidable, 63
Interchange (ASCII), 39 Brautigan, R., 159
Anomalous elliptic curve, 292 Brute-force, 5
Asymmetric cryptosystem, 3
Asymmetric key cryptosystem, 174
Average information, 30 C
Cartesian product, 84
Ceiling function, 17
B Characteristic, 125
Baby-Step/Giant-Step (BSGS), 259 Characteristic polynomial, 219
Bernoulli’s theorem, 25 Chinese Remainder Theorem, 96
Binary operation, 73 Chosen plaintext attack, 4
associative, 73 Church–Turing thesis, 66
closed, 85 Cipher
commutative, 73 block, 147
Binomial distribution function, 22 Ciphertext only attack, 4
Binomial random variable, 21 Circle group, 304

© Springer Nature Switzerland AG 2022 315


R. G. Underwood, Cryptography for Secure Encryption, Universitext,
https://doi.org/10.1007/978-3-030-97902-7
316 Index

Classical definition of probability, 12 Diffie–Hellman discrete logarithm problem


Classical probability space, 12 (DHDLP), 257
Closure of alphabet, 74 Diffie–Hellman
Cocks–Ellis public key cryptosystem, 200 discrete logarithm problem, 257
Commutative ring, 103 key exchange protocol, 255
Complexity Digest, 209
linear, 231 Digital signature, 203
theory, 53 Digital signature scheme (DSS), 204
Computational Diffie–Hellman discrete chosen-message attack on, 208
logarithm problem (CDHDLP), 268 direct attack on, 208
Concatenation, 74 known-signature attack on, 208
Concave function, 33 RSA, 205
Conditional entropy, 41 Discrete exponential function, 193
Conditional probability, 14 Discrete Logarithm Assumption (DLA), 194
distribution, 41 Discrete logarithm function, 193
Congruent modulo n, 82 Discrete logarithm problem, 257
Convex function, 33 computational Diffie–Hellman, 268
Coset Diffie–Hellman, 257
left, 86 elliptic curve, 289
right, 86 Discrete log hash function, 210
Cryptanalysis, 4 Discrete random variable, 19
frequency analysis, 150 Discriminant
Cryptographic hash function, 210 elliptic, 273
Cryptography, 1 Distribution function, 19
Cryptology, 4 binomial, 22
Cryptosystem, 3 uniform, 20
asymmetric, 3 Division ring, 104
asymmetric key, 174 Division Theorem, 104
attacks, 4
monoalphabetic, 155
polyalphabetic, 155 E
public key, 4 ElGamal public key cryptosystem, 195
simple substitution, 140 Elliptic curve, 276
symmetric, 2 anomalous, 292
Cycle, 80 supersingular, 291
decomposition, 80 Elliptic curve discrete logarithm problem
length, 80 (ECDLP), 289
Cyclic group, 89 Elliptic Curve key exchange protocol
(ECKEP), 287
Elliptic discriminant, 273
D Empty word, 36, 74
Data Encryption Standard (DES), 167 Encryption, 1
Decision problem, 61 key, 1
bounded-error probabilistic polynomial keyspace, 1
time decidable, 63 transformation, 1
instance, 61 Entropy, 29
polynomial time decidable, 61 conditional, 41
probabilistic polynomial time decidable, 62 joint, 41
Decryption, 2 rate, 38
key, 2 Euler’s criterion, 113
keyspace, 2 Euler’s function, 107
transformation, 2 Evaluation homomorphism, 124
Degree of field extension, 126 Event, 10, 11
Index 317

Exclusive and, 70 discrete log family, 210


Existential forgery, 208 MD-4 family, 212
Exponential time algorithm, 60 regular, 210
Exponent of a group, 92 Hash-then-sign RSA DSS, 214
Hill 2 × 2 cipher, 146
Homomorphism
F
group, 87
Factoring Assumption (FA), 180
ring, 123
Factor Theorem, 105
Hopf algebra, 306
Feistel cipher, 165
Hybrid cipher, 196
Fermat factorization, 187
modular, 189
Fermat prime, 200 I
Field, 104 Ideal, 117
extension, 104 maximal, 120
finite, 128 principal, 117
Galois, 130 sum, 120
Finite field, 128 Identity element, 74
Finite group, 75 Independent events, 14
Finitely generated group, 89 Index calculus, 262
Floor function, 54 Infinite group, 75
Forgery, 208 Information in an event, 30
existential, 208 Initial state vector, 218
selective, 208 Intersection of sets, 11
Frequency analysis, 150 Invented root, 122
Inverse (multiplicative), 104
G Irreducible polynomial, 121
Galois field, 130 Iterated block cipher, 165
Generalized Weierstrass equation, 276
Generating set, 89 J
Generator for a cyclic group, 89 Jacobi symbol, 113
Graph of an equation, 272 Joint entropy, 41
Greatest common divisor (gcd), 92 Joint probability distribution, 40
Group, 75
abelian, 75
cyclic, 89 K
direct product, 85 Kasiski method, 159
exponent, 92 Kernel of a ring homomorphism, 124
finite, 75 Key encapsulation mechanism/data
finitely generated, 89 encapsulation mechanism
homomorphism, 87 (KEM/DEM), 196
infinite, 75 Key exchange protocol
isomorphism, 88 Diffie–Hellman, 255
isomorphism class, 88 elliptic curve, 287
order, 75 Key stream, 163
of residues modulo n, 83 Key trial, 5
symmetric, 78 Known plaintext attack, 4
of units, 106 Koblitz curve, 276

H L
Hard-core predicate, 237 l-bit prime, 176
Hash function, 209 Lagrange’s Theorem, 87
cryptographic, 210 Law of Large Numbers, 25
318 Index

Least common multiple (lcm), 94 Pollard’s p − 1 algorithm, 182


of a finite set, 95 Polyalphabetic cryptosystem, 155
Left transversal, 86 Polynomial
Legendre symbol, 112 irreducible, 121
Letter, 74 minimal, 219
Linear complexity, 231, 233 positive, 63
Linear form, 272 primitive, 132
Linearly recursive sequence, 218 reducible, 121
homogeneous, 218 time algorithm, 54
matrix of a, 219 time decidable decision problem, 61
Polynomial time (P), 61
Positive polynomial, 63
M Power set, 12
Man-in-the-Middle attack, 267 Predicate, 237
Maximal ideal, 120 hard-core, 237
MD-4 hash function, 212 Prime Number theorem, 189
Mersenne prime, 176 Primitive polynomial, 132
Message authentication, 203 Primitive root, 110
Minimal polynomial, 219 Principal ideal, 117
Modular Fermat factorization, 189 Private key, 174
Monoalphabetic cryptosystem, 155 Probabilistic polynomial time (PP), 62
Monoid, 74 Probabilistic polynomial time decidable
Monoid of words, 74 decision problem, 62
MOV attack, 290 Probability
Multiplicative inverse, 104 classical definition, 12
Mutual information, 42 conditional, 14
Probability function, 11
classical, 12
N Pseudorandom bit generator, 236
Negligible function, 174 Public key, 174
Next-bit test, 236 Public key cryptosystem, 4
Nondeterministic experiment, 9 Cocks–Ellis, 200
ElGamal, 195
RSA, 178
O
Occurrence of event, 10
1’s complement, 55 Q
One-way trapdoor function, 175 Quadratic form, 274
Order Quadratic residue, 112
of an element, 91 Quadratic Residue Assumption (QRA), 246
of a function, 54 Quadratic sieve, 192
of a group, 75 Quotient ring, 118
notation, 54
of a polynomial, 130
R
Random variable
P binomial, 21
Partition, 11 discrete, 19
Perfect secrecy, 162 distribution function of a, 19
Permutation, 77 Recurrence relation, 218
even, 81 Reducible polynomial, 121
notation, 77 Redundancy rate, 38
odd, 81 Redundancy ratio, 39
Plaintext, 1 Regular hash function, 210
Index 319

Relative frequency, 10 proper, 85


Relatively prime, 94 trivial, 85
Residue modulo n, 81 Subring, 103
Ring, 101 Sum of ideals, 120
commutative, 103 Supersingular, 291
homomorphism, 123 Symmetric cryptosystem, 2
of integers, 101 Symmetric group on n letters, 78
isomorphism, 124
of matrices, 102
of polynomials, 102 T
quotient, 118 Tangent cone, 275
with unity, 103 Tangent space, 272
Round function, 165 Thue–Morse sequence, 253
Round key, 165 along a polynomial, 253
RSA digital signature scheme, 205 Total Probability theorem, 15
RSA public key cryptosystem, 178 Total variation distance, 153
Running time, 53 Trace of an elliptic curve, 290
Transposition, 81
Trapdoor, 174
S Turing machine, 66
Safe prime, 210, 250 2’s complement, 56
Sample space, 9, 11 2-safe prime, 250
Selective forgery, 208
Selector sequence, 226
Semigroup, 74 U
Sequence Unary representation, 69
eventually periodic, 218 Unbiased, 237
linearly recursive, 218 Undeniability, 204
periodic, 218 Unforgeability, 204
period of a, 218 Unicity distance, 7, 49
shrinking generator, 226 Uniform distribution function, 20
Shared-key cryptosystem, 2 Union of sets, 11
Shift cipher, 144 Unit, 103
Shrinking generator sequence, 226
σ -algebra, 11
Signature with privacy, 207 V
Simple algebraic extension field, 126 Vernam cipher, 160
degree, 126 Vigenère cipher, 156
Simple substitution cryptosystem, 140
Singular graph, 273
Singular point, 273 W
Smooth graph, 273 Weierstrass equation, 276
Smooth integer, 189, 262 generalized, 276
Spurious key, 45 Wilson’s theorem, 108
Square root attack, 18 Word, 74
Stream cipher, 163 length, 74
Subexponential time, 60
Subfield, 104
Subgroup, 85 Z
cyclic, 90 Zero divisor, 104
index of, 87

You might also like