A Security Analysis of Pretty Good Privacy
A Security Analysis of Pretty Good Privacy
net/publication/2878075
CITATION READS
1 2,768
1 author:
SEE PROFILE
All content following this page was uploaded by Sieuwert Van Otterloo on 09 January 2013.
September 7, 2001
2
Contents
1 Introduction 9
1.1 What is PGP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 How PGP works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Key size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Web of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Vulnerabilities beyond PGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Algorithms in PGP 29
3.1 Kinds of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Symmetric block ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Public key algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Hash functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Secret sharing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3
4 CONTENTS
6 Sourcecode analysis 73
6.1 How the code was analysed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Memory management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 File Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.4 Public key code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.6 The XXX comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7 Conclusion 81
7.1 Is PGP broken? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Does PGP contain a backdoor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3 Which version of PGP is best? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.4 How can PGP be more secure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Preface
Congratulations on your acquisition of A security analysis of PGP. I hope you enjoy reading
this text, and that it will encourage you to learn more about cryptography. To help you read
more about cryptography, there is a CD-Rom accompanying this paper. It contains many of the
sources that were used to make this text, the PGP program, and some otherwise useful programs.
One of the highlights is the complete text of the Handbook
of applied cryptography [13]. This book is a complete in-
troduction to cryptography, and much of what is mentioned Figure 1: Bruce Schneier’s book
in this text can be found there, even if the handbook is not
cited. If you want to buy a book, I recommend Applied
Cryptography[18], which is not as good as the handbook but
cheaper and nice to read. Those who live in the neighborhood
can borrow it from me.
To further enhance your reading pleasure, I have resisted
starting all chapters with the crypto truisms that you will see
often enough, like
It may roundly be asserted that human ingenuity
cannot concoct a cipher which human ingenuity
cannot resolve. Edgar Allan Poe
Anyone who considers arithmetical methods of
producing random digits is, of course, in a state
of sin. John Von Neumann
I did include a few illustrations to enhance the glance-
through-factor, but I am afraid it is still a very heavy text.
I also added an index that will help you find definitions and explanations of important terms and
acronyms. All words in this font are in the index.
5
6 CONTENTS
• How secure is PGP? Are there any known or unknown weaknesses or backdoors?
• What are the differences between the different versions, editions and distributions of PGP?
Which versions are secure, and which are not?
One of the best sources of information was the sourcecode : One of the main features of PGP
was that this code was open for public inspection, so that you did not need to rely upon the skills
and intentions of computer programmers you do not know. One of the goals was to do a close
sourcecode analysis.
The discoveries of Ralph Senderek (see section 5.4) were one of the inspirations for me to do this
research. When I began with cryptography I doubted that widely known and famous programs
like PGP could contain serious or interesting bugs, but discoveries like Senderek’s forced me to
alter my thoughts. Senderek did not make his discoveries by sourcecode analysis, but by using an
experimental aproach, analysing the input and output of PGP. Therefore the actual sourcecode
analysis takes only takes one chapter, and other methods have been tried as well.
As an introduction, we wanted to describe the history of PGP, including all controversies. It is
not hard to find information on this subject, since many people feel that the PGP story is worth
telling, but putting it all together, and retrieving the right dates was harder. The full story does
not say anything about the security of PGP, but we hope that the story contributes to a better
understanding of the design decisions of PGP, and that it gives insight into the social aspects of
cryptography.
As a further aid to understand what the program is doing, we ran the program with de-
bugging aids like SoftIce, IDA PRO, and the utilities NT file monitor or registry monitor from
www.sysinternals.com. We only studied PGP on Windows NT, because we reason that this is the
way companies use PGP, and it was the platform at hand. We did not focus on studying the
interaction of PGP and the Operating System(OS), because that takes lots of detailed knowledge
of the OS (that we do not have nor want to have), and all errors discovered are most likely errors
in the operating system. It is widely known that Windows is insecure, and we took no effort in
further proving that point.
I would like to thank the people who helped me making this paper: Leon Kuunders, Beernd
Noordkamp, Ernst van Rheenen, Phil Zimmermann and Werner Koch.
The official website of my PGP research is
www.bluering.nl/pgp
This thesis can be downloaded there. If you find any error or omission, or just want me to know
what you think of it, please contact me:
Introduction
Pretty Good Privacy (PGP) is a computer program for personal privacy. It delivers privacy by
making it impossible for other people to read your computer files and email, and by making it im-
possible for other people to impersonate you. To achieve this it uses cryptographic techniques: en-
cryption and digital signatures. Most people are familiar with normal, conventional encryption,
where the sender and receiver share a secret password (often called a key), that is used to encrypt
and later decrypt messages. These systems are simple to understand but hard to use because you
must meet someone in private to agree on a password first, and you must exchange many passwords:
a group of 100 people communication with each other needs to exchange 4950 keys. To overcome
these problems PGP implements a Public Key Infrastructure (PKI). PKI systems need far less
keys (100 key pairs for a community of 100 people), and can also be used for unforgeable digital
signatures. A drawback is that PKI systems are more complicated.
PGP is even harder to understand and use because it is a distributed system: There is no
Central Server trusted by everyone to keep track of the infrastructure. All users need to make
decisions about whom to trust and about the correctness of keys themselves. It would have been
easier to leave those decisions to dedicated, trustable organisations.
PGP started as a small utility. I guess the original idea behind PGP was that PGP protected
the communication between computers using the new RSA algorithm. The threat model was a
government agency eavesdropping on individuals. During its lifetime the threat model changed,
because computers became multitasking and internet-connected. Not only the communications
needed to be protected, but also the computer itself had to be guarded against hostile take-overs.
File erasing, hard disk encryption, a TEMPEST safe viewer and network encryption was added.
A user-friendly windows version was created, and all kinds of features, like keys with photo’s,
integration into mail programs, and windows integration. The intended user group shifted from
individual people to companies, because only companies actually paid for the program.
PGP is famous because it was one of the first publicly available programs with strong cryptog-
raphy, cryptography that could stop major governments. It was free, people could read the source
if they wanted to, and ”it was so good that it was illegal”. A large group of hackers supported
and co-developed the program. Later on it wasn’t illegal and owned by a large company, so rumors
about backdoors rose (”If it is good why isn’t it illegal?”).
9
10 CHAPTER 1. INTRODUCTION
• to manage the collection of PGP keys you have: set properties, import or export, validate,
etc.
After you installed PGP you create a key pair for yourself, and you import the public keys of your
friends. You export the public part of your key. Now you can encrypt files: You select names of the
keyring of people that are allowed to read the message, and PGP creates an encrypted file that can
only be decrypted by the owners of the selected keys. You can also sign files: You select a private
key, enter the pass phrase of that key, and PGP either creates a file with the signed data and the
signature, or a separate signature. It is also possible to combine these two actions, and sign and
encrypt a file at the same time. If this is the case, the signature can only be verified after the file
is decrypted.
If you receive a signed file, you can doubleclick it and PGP will verify the signature. If you
receive an encrypted file, you can open it and PGP will ask for your passphrase so that it can use
your private key to decrypt the message. Further things to do are signing keys if they are valid
and setting the trust values of the public keys you collected. This is described in the web of trust
section below.
1.3 Mathematics
Cryptographic algorithms are often based on mathematical structures, especially on group theory
and number theory. I do not intend to give a crash-course in number theory here, because good
books on the subject already exist, but I would like to introduce the properties used and their
notation here. Group theory is not explained here, because up til now PGP only uses algorithms
based on groups containing integer numbers.
size(a) The size of a number is the number of bits needed to write down the number in binary
form. size(a) ≃ log2 (a)
a ∗ b This denotes normal multiplication. I prefer to use an explicit symbol instead of ab. The
latter is more common in mathematical texts but I have a programming background, where
the * sign is needed.
gcd(a, b) The greatest common divider of a and b. It denotes the largest number c with c | a
and c | b. It can be calculated using the recurrence gcd(a, b) = gcd(b − a, a). This is called
the Euclid algorithm. The Euclidean algorithm is one of the oldest algorithms (discovered in
ancient Greece), and very important in number theory.
1.4. KEY SIZE 11
a mod n is the number c, 0 ≤ c < n for which there is a number k such that a = c + k ∗ n.
a + b mod n first calculate a + b (this is approximately one bit larger than a or b), and then apply
modn. For each n, all numbers c , 0 ≤ c < n form a group with + mod n as the group
operation.
a ∗ b mod n first calculate a ∗ b (this has roughly size size(a) + size(b)), and then apply modn.
ab mod n This is usually done with b ∈ Zn 1 and then ab has roughly size size(a) ∗ size(b). This is
often too large to fit in memory, so this expression is calculated by writing b in binary form,
like b = 2i0 +2i1 +. . .+2im ,calculating k0 = 1, k1 = a, k2 = a2 mod n, k3 = k2 2 mod n, . . . and
then calculating the answer with ab mod n = ((ki0 ∗ ki1 ) mod n ∗ ki1 ) mod n ∗ . . . ∗ kim mod n.
a−1 mod n The above procedure is only applicable for 0 ≤ b < n. a−1 mod n is defined to be the
number such that a−1 ∗ a mod n = 1. It exists if gcd(a, n) = 1. It can be calculated using
some intermediary values you get when calculating gcd(a, n) using the (extended) Euclidean
algorithm.
Zn The additive group with all numbers x , 0 ≤ x < n as members and with addition modulo n as
group operation: c = a + b mod n.
Zn ∗ The multiplicative group with all numbers x, 0 < x < n, gcd(x, n) = 1 as members. The group
operation is c = a ∗ b mod n.
a ⊕ b The bitwise exclusive OR of two integers. It is calculated by writing the inputs in binary
form, and setting a bit of output 1 if the input bits are different, 0 if they are the same. For
instance 12 ⊕ 10 = 1100 ⊕ 1010 = 0110 = 6.
ord(g) The order of an element g in a group is the smallest number k such that gk = 1.
raw key size Sraw (K): For a key K, the raw key size Sraw (K) is the number of bits needed to
write down K. We can define, for an algorithm or crypto system A, Sraw (A) = Sraw (K) if
K is a key for A and all keys for A have the same raw size.
real key size Sreal (A): The real key size of a certain algorithm/crypto system A (with all param-
eters fixed) is the 2-based logarithm of the number of possible keys. Note that the different
keys of A need not have the same raw key size. This key size relates to the hardness of a
brute force attack: trying all possible keys.
effective key size Sef f (A): The effective key size, or the strength of A, is the 2-based logarithm
of the number of keys that must be tried to break the system, or of the number of actions
an attacker must do to break the system. Note that the attacker might have a better attack
than a brute force attack, so that the strength can be less than the real key size.
As some sort of proof of this fact: The Electronic Frontier Foundation built a specialised DES
cracking machine, that could search trough 256 keys in a few hours. The American government has
long restricted the key size to 40 bits. That is a million times less secure than insecure.
This fact needs redefinition in the future because computers get faster, and the number of computers
is also increasing. In fact, some people have already gone from 80 to 128, and some (paranoid)
people are already thinking of even larger key sizes, because modern computers can handle it and
having more key bits does certainly not decrease the security. However, increasing the key size is
often improving one of the strong spots of PGP, while attackers usually focus on a weak spot.
For block ciphers, designers try to make Sraw (A) = Sreal (A) = Sef f (A), and they can choose
Sraw (A) equal to 128 bits to have some sort of safety margin. Since there are usually no constraints
on the keys, Sraw (A) = Sreal (A) can be achieved easily. One notorious counterexample is DES: DES
had an 8-byte key (64 bits), but the last bit of every byte was not used, so Sraw (DES) = 64 and
Sreal (DES) = 56. Triple DES suffers the same problem: Sraw (3DES) = 192 and Sreal (3DES) =
168 (and Sef f (3DES) ≤ 112 because of a meet in the middle attack).
In the distant future scientists might develop a quantum computer. This is a device based upon
the principles of quantum mechanics that can do calculations on many numbers on the same time.
Quantum computers can do no more than reducing the amount of work to do by a square root
2 , so people worried about quantum computers should use double size keys: 160 bit or 256 bit
keys instead of 80 or 128 bit keys. If quantum computers work the way we expect them to do,
Sef f (A) ≤ Sreal (A)/2, but please take this with a grain of salt because quantum computers are
distant future.
2
according to Schneier, 10th Oct 1998 posting to soc.history.what-if USENET group. See the pgpdhrsafaq on the
CD
1.5. WEB OF TRUST 13
For hash functions H there is no key, so the size of the output is taken as the ’key size’ of
the hash function. The birthday attack assures that Sef f (H) ≤ Sreal (H)/2. Therefore Sreal (H)
must be at least 160. MD5 fails this requirement, but since Sreal (M D5) > 2 ∗ 60 it is not per
definition insecure. Let’s call it dubious.
For public key algorithms the matter is more complicated. For the ’key-size’ of a PK algorithm
one often takes the ’group size’ as the size of the key. Sraw (P K) is often larger, because you need
to take all the numbers into count to get the raw key size. There is also a difference between the
secret part of the key and the public part of the key. For Sef f (P K) only the secret part is relevant.
Furthermore it often holds that Sraw (P K) > Sreal (P K) because the numbers involved must have
special properties like being prime.
For public key algorithms we also want that Sef f (A) ≥ 80, and because of attacks similar to
the birthday attack, it holds that Sreal (x) ≥ 160 with x a part of the secret key. These attacks are
described in section 3.3.2
Public Key algorithms often work in a certain kind of group. The name of the group is known,
but some things must be hard to do without the private key, so people should not find out the
structure of this group. In practice, for the algorithms in PGP, it means that the group must
have at least 21024 members to offer an 80 bit strength against factoring algorithms. Very different
groups, like Elliptic curves, could behave differently and need a different group size.
have never met. This web of trust is one of the main features of PGP. Other PKI infrastructures
rely on some central server to publish certificates, and ask each user to rely on those certificates.
That system has two disadvantages. It costs a lot of money: those certificates will not be free, and
they are often distributed with a limited lifetime, forcing you to pay a considerable sum of money
each year. The second problem with a central certification authority is, and this is a debatable
objection, that it is not wise to trust central authorities. You cannot check that they are careful,
and it is very easy for the government to manipulate these authorities. If you trust the government,
you might as well assume that people are friendly and the internet is safe, and stop bothering about
encryption.
The PGP philosophy gives you the right to choose whom to trust. Everyone maintains his own
database of trust, and create his own certificates. You can share the information for free with your
friends, without any government control. The only drawback is that the PGP web of trust system
is hard to grasp, and it is easy to draw wrong conclusions. The most common pitfall is to confuse
the validity of someone’s key with the trustworthiness of the owner. In the first variant of the above
example, A relied on C’s signature, and his key was signed by his friend B. B did not state that
C was trustworthy. By signing some-ones key you only state that the key belongs to the person
whose name is on it. The decision to trust C was made by A, and if A does not know C, it was a
wrong decision.
The above explanation is the simple, basic web of trust. PGP uses a more advanced system
using marginal or partial trust. The idea is that you should never trust people or keys totally:
Each extra step in the chain of reasoning about validity slightly reduces the probability that a key
is valid, and you should only use keys with a high enough probability of being valid. The chance
that a key is valid increases if you can establish two independent paths from him to you.
Another feature of the PGP system is the concept of meta-introducers. You can trust people
on different levels. For example
level 0 You can trust that you have a valid copy of B’s key.
level 1 You can trust that you have a valid copy of B’s key, and that B only signs valid keys. That
is: B is a trustworthy person.
level 2 You can trust that you have a valid copy of B’s key, and that B only signs valid keys of
trustworthy persons.
...
level n You trust that B’s key is only used to sign keys belonging to people of category n − 1.
(and that you have a valid copy of B’s key).
If you trust B on or above level 1, B is a meta-introducer for you. His signature can validate whole
families of keys. It seems not right to trust regular people on high levels, because few people are
that careful. The concept is useful because many organisations have special keys that are stored
extra securely, and only used to sign keys. The sysop of the organisation will verify all keys by
phone and sign them with the keysigning key, and everyone else in the company can trust the
keysigning key on level one.
Ueli Maurer[12] designed a probability model for reasoning with partial trust. This is the system
used in PGP. It uses the basic reasoning rules sketched above to reason about validity of keys, based
on a set of initials beliefs about validity and the signatures given. Each of the initial beliefs is given
independently a confidence value that can be considered a probability value between 0 and 1. For
1.6. VULNERABILITIES BEYOND PGP 15
each statement that can be derived, the confidence is the probability that any set of initial beliefs
from which it can be derived is completely true. This leads to a system where partially independent
validity paths increase the confidence in a key validity in a plausible way.
The advanced model seems to be very well thought. In practice one does not always want to deal
with such detailed estimates of trust. The PGP 7.0 does not allow arbitrary confidence settings,
but only 4 steps: invalid, valid but not trusted, marginally trusted and fully trusted. When using
PGP myself I tend to only use fully trusted and untrusted, because it is easier to understand, and
I think most people do the same.
It is important to have your key signed by a lot of people if you want it to be easy for other
people to verify your key. The early users of PGP, the ’hackers’, went to key signing parties to get
those signatures: All party people bring their public keys, and get their key signed by all other
participants.
The information you give to PGP about trust is treated confidentially: Signatures are only
exported if the user explicitly allowed it, and the trust information is never exported. This is
important: You do not want your best friend to know you do not trust him because he is sloppy.
If one is willing to open the computer, things become interesting again. The data is still not
destroyed if overwritten once. Destruction of data or DoD is harder than it seems. By using the
right electronics one can find out what the previous data on a disk was by measuring the strength
of the signal. For instance: If the disk gives 0.98, it indicates a 1 overwritten by a one, 0.8 is a
zero overwritten by a one, 0.2 indicates a 1 overwritten by a 0, and 0.02 indicates a 0 overwritten
by a 0. If an attacker is even willing to open up the hard disk, he can measure the entire surface
of the disk, and the off-centre parts of a track. Peter Gutmann [7] wrote an article about it that
forms the basis of PGP’s file wiping routines. It involves overwriting the data several times with
different patterns, making it as hard as possible to retrieve previous values. This is really a task
for an operating system, but because the major OSes are not written for security PGP will do it
for you, since PGP 6.0.
The same paper [7] also tells how to destroy data from memory, but this theory is not brought
into practice by PGP. I guess the PGP makers think these attacks are not likely.
much information about you. PGP does not prevent against this. If you are afraid of this, you
should send email indirectly (send the message encrypted to a friend and ask him to send it through
with a random time delay), or set up a mail server to do this.
1.6.7 TEMPEST
TEMPEST (no acronym, but derived from
Shakespeare – or maybe it means Transient Elec-
tromagnetic Pulse Emanation Standard) stands Figure 1.1: the TEMPEST resistant viewer
for collecting secret information by ’listening’ to
the radiation emitted by electronic equipment. It
dates back to the 1960’s and military equipment
is often shielded against these attacks. TEM-
PEST does not happen often, since it takes ex-
pensive equipment, but is a possible threat nev-
ertheless. See www.tempest.org for more infor-
mation. The best results are obtained by lis-
tening for radiation emitted by cathode ray tube
screens (ordinary glasstube monitors). PGP has
the option to show messages in a special font with
smooth edges and irregular characters that makes
the TEMPEST signals harder to catch. Check http://www.eskimo.com/˜ joelm/tempest.htm for
more information.
18 CHAPTER 1. INTRODUCTION
Chapter 2
The first half of this chapter is mostly based on [6]. Other information comes from various internet
resources, which can be found on the CD-ROM.
19
20 CHAPTER 2. THE HISTORY OF PGP
which protect all public key cryptography, according to Cylink. Some people think those MH-
patents are invalid, because the patented technology is not working: the knapsack system described
in the patent is broken.
Phil Zimmermann needed a license to use the RSA algorithm in his program. The company
still exists. The name has been changed to RSA Security (websize:http://www.rsasecurity.com/ ).
2.2 Guerrillaware
2.2.1 PGP 1.0
Zimmermann went on programming and in 1991 his program, PGP, was nearly finished. In April
1991 he wrote a letter to RSADS asking for ’the free license they had spoken about’. Jim Bidzos
refused to give this license. Phil Zimmermann could have bought a license, but that would mean
that PGP could not be distributed as shareware like he wanted to do. So there seemed to be a
problem. At the same time something else happened: the American government tried to forbid
cryptography in the 1991 Anti-Crime Bill S.266. The relevant paragraph was removed after an
industry-wide offensive, but Zimmermann felt really attacked by this law: He had invested years of
his life in PGP, was almost losing his house, was frustrated by RSADS, and now the government
tried to illegalize his program. In June he quickly ’released’ his program in the following way: He
gave a copy to a friend, who put it on the internet. The software was released by the imaginary
company Phil’s Pretty Good Software (PPGS).
RSADS, in the person of Jim Bidzos, was not very happy about it. Although the company was
not very threatened by the program, PGP set a bad example, and other people might also decide
not to pay for the use of RSA. Jim Bidzos also had a personal dislike against Phil Zimmermann,
so RSADS threatened to sue Zimmermann for spreading a program that illegally used a patented
algorithm. Zimmermann defended himself by saying that not he but a friend distributed the
program. They made an agreement that Zimmermann would not spread any more copies of his
PGP program, and that RSADS would not sue him for the copies already distributed. But there
was no way to stop the success of the program: people continued using PGP.
Version 1.0 of PGP used RSA as public key algorithm, Bass-o-Matic for conventional encryption,
MD4 as hash algorithm, and LZHUF as data compression algorithm. The Bass-o-Matic algorithm
was created by Zimmermann himself, and at the Annual Crypto conference held in August 1991,
he discovered that it was insecure. (He designed a cipher himself because the DES, the best known
cipher, had a too small keysize and he knew no alternative).
Besides this security problem, this version of PGP also suffered from a few legal problems:
• It uses the patented RSA algorithm. The patent was only valid in the US, because only in
the US it is possible to obtain a patent after publication.
• PGP contains cryptography, and cryptography falls under the International Traffic in Arms
Regulations. It was illegal to export cryptographic software (except in book form, strangely
enough).
• Other countries have other rules against importing and using cryptography. These however
were not very relevant for the development op PGP in this stage.
The US government where tipped by Jim Bidzos that Zimmermann was infringing software
patents. During the investigation, they changed subject: They tried to prosecute him for violating
the export restrictions. PGP users started a campaign against the export rules, and to fund
Zimmermann’s legal expenses. The US government finally gave up the investigation in early 1996.
2.2. GUERRILLAWARE 21
• PGP 2.63uin
• PGP 2.6.3CKT
2.2.9 PGPPhone
Phil Zimmermann also tried to make a program for making secure telephone calls. PGPPhone is
an attempt to make encrypted phone calls over the internet. It did not work very well, mostly
because internet telephony requires a very, very good internet connection. On march 21 1999 the
sourcecode of the program was released because the new owner of anything PGP, NAI, did not want
to spend more time on the program, and after that not much has happened.
2.3. PGP AS A NORMAL PROGRAM 23
PGP For Personal Privacy Freeware 5.0 A MIT release. It can use, but not generate RSA
keys.
PGP 5.0i The international freeware version. It features RSA generation and use. This version
was the first version exported out of the US without breaking the export rules. PGP Inc
printed and published the program in book form (10 books, 5000 pages) and the books were
exported to Europe (exporting cryptographic software on paper was not forbidden). A team
of volunteers, lead by Stale Schumacher from Norway, converted the books back to bytes by
means of scanners and OCR software(70 volunteers, 1000 hours of work). The website of
these volunteers is: www.pgpi.org. The latest version exported this way is 6.5.1i, because in
1999 the US government lifted the export controls.
PGP 5.0ic This version is distributed by PGP Europe. This was an independent company, not
affiliated with PGP Inc, but with a license to use their trademarks. It features RSA generation
and use. I assume the company no longer exists, because I have never read about it again.
Version 5 also supported the use of key servers: The program could automatically connect to
internet servers that maintained a database for public keys.
24 CHAPTER 2. THE HISTORY OF PGP
The RSA-marked versions can use and generate both kinds of keys, The DH marked versions just
work with Diffie-Hellman keys.
A bug was discovered in the handling of RSA keys larger than 2048 bits. NAI fixed it in the
next version:
PGPDisk
The business version of PGP 6.0 includes PGPDisk: This is a separate program that can be used
to create virtual disks that are encrypted entirely. This program can be used without the rest of
PGP, but uses the same encryption techniques. (It uses no public key encryption.) PGPDisk is
included in all commercial versions after this version, but not in all freeware versions.
This version has had a few subversions that offered minor improvements:
• PGP 6.5.2 Support for Windows 2000, and Intel RNG support.
• PGP 6.5.3 First version legally exported in executable form. No new features
PGP 7.0.1
Most important new feature is the introduction of a new algorithm for conventional encryption:
AES.
• PGP 5.5.3ckt
• PGP 6.0.2ckt
• PGP 6.5.8ckt
2.4. THE FUTURE OF PGP 27
packages ever distributed. It is simply too much to list, and the differences are not very shocking.
Looking at the version number will give the approximate release, and the code after the version
number gives a hint about the features (noRSA version do not support RSA for instance). Small
version number changes indicate bug fixes.
name date legal status maker features
PGP 1.0 June 5 1991 illegal PPGS Bass-o-Matic
PGP 1.4 January 19 1992 illegal PPGS
PGP 1.5 February 12 1992 illegal PPGS
PGP 1.6 February 24 1992 illegal PPGS
PGP 1.8 March 29 1992 illegal PPGS
PGP 1.8a May 23 1992 illegal PPGS
PGP 2.0 September 2 1992 illegal volunteers IDEA
PGP 2.1 December 6 1992 illegal volunteers
February 17 1993 criminal investigation starts
PGP 2.2 March 6 1993 illegal volunteers
PGP 2.3 June 13 1993 illegal volunteers
PGP 2.3a July 1 1993 illegal volunteers
PGP 2.4.x November 6 1993 commercial ViaCrypt
PGP 2.5 May 5 1994 free MIT uses RSAREF library
PGP 2.6 May 22 1994 free MIT incompatible with previous
PGP 2.6.x 1994-1995 free/illegal many
January 11 1996 investigation stops
PGP 4.0 March 12 1996 commercial ViaCrypt
March 21 1996 PGPPhone 1.0beta released by MIT
PGP 4.5 February 4 1997 commercial ViaCrypt corporate key
April 29 1997 Merkle-Hellman patent expires
PGP 5.0 June 16 1997 free/commercial PGP Inc 3DES DH CAST noRSA
PGP 5.5.3 November 26 1997 free/commercial PGP Inc ADK introduced (+bug)
December 1 1997 PGP is sold to NAI
PGP 5.5.5 December 4 1997 free/commercial NAI
PGP 6.0 September 1 1998 free/commercial NAI
PGP 6.02 November 12 1998 free/commercial NAI RSA support in all editions
PGP 6.5.1 April 5 1999 free/commercial NAI
January 2000 export restriction lifted
PGP 6.5.3 January 31 2000 free/commercial NAI
August 2000 Ralph Senderek discovers ADK bug
PGP 6.5.8 August 25 2000 free/commercial NAI ADK-bug fixed
September 20 2000 RSA patent expires
PGP 7.0 September 10 2000 free/commercial NAI no source code
PGP 7.0.3 February 2 2001 free/commercial NAI minor bugfixes
February 19 2001 Zimmermann leaves NAI
Table Notes
noRSA This means that certain editions of this version did not have full RSA support.
Chapter 3
Algorithms in PGP
Block ciphers Symmetric encryption algorithms do normal encryption, where the same pass-
word or key is needed for encryption and decryption. Block ciphers are symmetric ciphers
that operate on blocks of say 8 or 16 bytes at the same time. The key also has a fixed size
(typically 16 bytes or 128 bits). Block ciphers are the working horses of cryptography: they
are fast, easy to program, storage-space efficient (they do not expand the data), so they can
handle bulk data. Conceptually they replace a large secret (the data) with a small secret (the
key). It is up to advanced protocols to do something spectacular with the small secrets.
PK algorithms Public Key algorithms do the tricks with public and private keys. They can
be slow, and are also more vulnerable to special kinds of attacks. Therefore these keys are
only used to encrypt small pieces of random data, like randomly generated session keys of
block ciphers and the outcome of hash functions. Public key algorithms can normally be used
for encryption and signing.
Hash functions A hash function takes a variable length file and produces a short fixed size
hash or fingerprint of that file (often 16 or 20 bytes). They resemble block ciphers because
29
30 CHAPTER 3. ALGORITHMS IN PGP
they are optimized for processing bulk data, and they replace a large doubt (whether the
file is the same) with a small doubt (whether the hash is the same). Again it is up to other
algorithms to do something with the fingerprint.
Fingerprints have two important properties: an attacker can never find, for a given fingerprint,
a file with that fingerprint, and an attacker cannot find two files with same fingerprint (cannot
find in the practical sense. Theoretically they do exist, but because there are so many hash
values an attacker will not have the time to generate enough fingerprints to find a match).
Because of these properties it is enough to sign the hash of a file, instead of the whole file.
Hash functions are also used to convert pass phrases (which vary in length) to keys for block
ciphers (which have a fixed length), and to convert data with some randomness to data looking
truly random. PGP can calculate a fingerprint for a public key. You can check that instead
of the whole key when verifying a public key.
Secret sharing algorithms These algorithms are not needed for the basic functions of PGP,
but can be used in the later versions. Secret sharing algorithms can split a private key in
any number of blocks m , such that if you bring any n of these blocks together you have the
private key again, but you cannot find this key with less than n blocks (n ≤ m). Typical use
is in a company with 3 executive officers, where any two of these three are together allowed
to sign a contract.
3.2.2 IDEA
IDEA started its life in 1991, under the name IPES(Improved Proposed Encryption Standard). In
1992 the name was changed to International Data Encryption Algorithm. Its creators are Xuejia
Lai and James Massey. It is a 64 bits block cipher with a 128 bits key. It’s design is based on mixing
bitwise exclusive OR (⊕), Addition modulo 216 and multiplication modulo 216 +1 (the famous prime
65537). It is fast in software (all personal computer processor chips can do multiplication in a single
instruction), but less efficient in hardware (the small processors used in chip cards do not have such
instructions). IDEA is patented, and the patent is held by Ascom-Tech AG in Switzerland.
There are no practical attacks known against the complete IDEA algorithm. If you intend to
attack PGP, you can forget breaking any of the block ciphers. This is not the weakest point.
Gory details
This section gives a detailed description of IDEA, for those people who want to know how a block
cipher works. Reading this section is not needed to understand the rest of this paper.
3.2. SYMMETRIC BLOCK CIPHERS 31
• The key is expanded to give many 16 bit subkeys. For each of the 8 rounds there are six
subkeys S1, S2, S3, S4, S5 and S6, and for the final output transformation we get another
four subkeys S1, S2, S3 S4. Then the following steps happen in each of the eight rounds:
– The 64 bits input of this round is divided in 4 16 bit numbers: X1 , X2 , X3 and X4 .
– A = X1 + S1. This is addition modulo 216
– B = X2 + S2
– C = X3 + S3
– D = X4 ∗ S4.This is multiplication modulo 65537. 0 multiplies like -1.
– E =A⊕C
– F =B⊕D
– G = E ∗ S5
– H =F +G
– I = H ∗ S6
– J =G+I
– K = A⊕I
– L=C ⊕I
– M =B⊕J
– N =D⊕J
• the output of each round is K L M N. After each round except the last the two inner blocks
are swapped, so the next round starts with (X1 X2 X3 X4 )i+1 := (KM LN )i .
• after the eighth round a final output transformation is applied (with new subkeys S1, S2, S3
and S4):
– W = X1 ∗ S1
– X = X2 ∗ S2
– Y = X2 ∗ S3
– Z = X3 ∗ S4
What the subkeys are depends on whether you are encrypting or decrypting. If you are en-
crypting you just take for the first 8 subkeys the bits of the key K, divided in 16 bit groups, then
you shift the key 25 bits to the left, take again 8 subkeys, shift 25 bits left, ... until you have all the
subkeys. Table 3.2.2 summarizes this. K0−16 means taking bits 0 to 16 (exclusive) from the key.
If you are decrypting the subkeys are inverses of the encryption subkeys, but in a backwards
order. The idea is that you try to do exactly the inverse operation of what happened at the
encryption. Let us define E(1,2) or shorter E12 as the second encryption subkey from round 1, and
-X as the additive inverse of X, and X’ as the multiplicative inverse of X. Then table 3.2 gives the
decryption subkeys.
IDEA is similar to other block ciphers, except for one thing: It does not have S-boxes. A S-box
is a lookup table that acts as a very random function: It must not be linear or otherwise systematic.
The idea is that XOR and addition diffuse the information, while the S-boxes do the real obscuring.
Often the S-boxes determine the strength of the cipher against cryptanalytic attacks. IDEA uses
the multiplication modulo 65537 as S-box. This is a very chaotic function, so they have a very
large S-box that they do not have to store.
32 CHAPTER 3. ALGORITHMS IN PGP
3.2.3 CAST
Carlisle Adams and Stafford Tavares created the CAST blockcipher. They claim that the name refers
to the design procedure, but it seems more likely that they deliberately named it after their initials.
The exact variant of CAST used in PGP is CAST5-128 as defined in RFC 2144[1]. It is a Feistel
cipher with 128 bit key and 16 rounds, and it operates on 64 bit blocks.
The original CAST description did not specify what S-boxes should be used. The RFC defines
them. The document gives a description of CAST-128: The 128 bit variant of the algorithm. This
variant can also be used with reduced key sizes. To differentiate these variants, CAST-128 is also
called CAST5, and the actual key size can be postfixed. PGP uses the largest keys possible: 128
bits. Hence CAST5-128.
3.2.4 3DES
DES stands for Data Encryption Standard. This encryption algorithm was created in 1977 by the
US government (NIST and NSA) based on work done by IBM. DES is a 64 bit blockcipher with a 64
bit key. Unfortunately, the last bit of every byte in the key is a parity bit, so effectively there is only
a 56 bit key. This is way too short, a brute force attack with a dedicated DES cracking machine
would take less than a day. Triple DES or 3DES is doing DES 3 times, with different keys. So triple
DES is a 64 bit block cipher with 168 key bits (plus 24 parity bits). The original DES algorithm
has been studied for a very long time, and therefore experts consider 3DES to be very secure. The
drawback is that it is significantly slower than al other algorithms: DES on itself is slow because it
uses bit permutations that are easy in dedicated chips but slow on general purpose computers, and
because you need to do three operations to get the security of two operations. The only reason for
using Triple DES is that it is very well studied. It is a cipher for conservative people.
3DES is the most interesting block cipher in PGP. Multiple encryption makes ciphers stronger,
but it is not problem free. An attack on double encryption is described below. DES itself is also
not as good as the other ciphers. There are attacks known against the 16 round DES encryption,
making 3DES in theory the most doubtful cipher in PGP.
Differential Cryptanalysis
Differential cryptanalysis can find a DES key using a chosen plaintext attack with 243 chosen
plaintexts. A chosen plaintext attack is an attack where the attacker can submit plaintexts of his
choice and receive the encryption. This is a very theoretical attack, because it is not feasible to
collect that much plaintext. The S-boxes in DES are designed to be very strong against differential
cryptanalysis, so one can conclude that the NSA knew about this technique back in 1977. The rest
of the world heard about it in 1990 from Eli Biham and Adi Shamir.
Linear Cryptanalysis
Linear cryptanalysis is even more powerful against DES. It is a similar technique, but different,
and discovered by Mitsuru Matsui in 1993. It can do a known plaintext attack against DES using
243 plaintexts. For a known plaintext attack, the attacker has access to a certain amount of
plaintext and the corresponding ciphertext, but cannot choose what the plaintext is. This attack
is almost practical, although 243 is still a lot. This attack does not generalize to 3DES, so it has
no practical value against PGP, but it does show that a blockcipher need not be secure, even if it
34 CHAPTER 3. ALGORITHMS IN PGP
For each possible key k, he encrypts and stores Ek (P1 ). When finished, he tries to compute for all
keys l the expression Dl (C1 ) and looks the result up in memory: If it finds a match Ek (P1 ) = Dl (C1 )
then kl is a likely candidate for the key. Assuming that the attacker has a few more known plaintexts
he can check whether it is correct, and try more l values if it is not.
Although the extra memory requirement of 256 blocks is not completely trivial, it is clear that
double encryption offers not much extra security. For triple encryption there is a similar meet-
in-the middle attack, and this attack takes 22n steps (and 2n memory blocks). The first triple
encryption implementations had to be backward compatible with singe encryption, so they used
encryption-decryption-encryption:C = Ek3 (Dk2 (Ek1 (P ))). This does not affect the security, but if
you take k1 = k2 = k3 it is equivalent to single encryption.
To complicate things there is also a triple encryption scheme that needs only two keys: C =
Ek1 (Dk2 (Ek1 (P ))). This is not a bad scheme at all if you are short on key storage space, but it is
not as secure as triple encryption. See [18] page 359.
PGP uses 3DES with a 168 bits key(real key size) in EDE mode. This gives an effective strength
of 112 bit.
3.2.5 Rijndael/AES
In 2000 the blockcipher Rijndael was elected to be the new Advanced Encryption Standard. It
replaces the DES standard. PGP 6.5.8 does not have AES support, but versions after PGP 7.0 do.
The AES is a very new cipher, and no attacks are known.
Rijndael has been designed by Joan Daemen and Vincent Rijmen. It is an unorthodox design:
It uses operations on polynomials, and decryption is not as fast as encryption. It has a block size
of 128 bit (twice the size of the other ciphers) and a key size of 128, 192 or 256 bit. For more
information check http://csrc.nist.gov/encryption/aes/rijndael/
instance) they will be visible in the output. Another problem is that the same input file will give
the same output each time. And last but not least: an attacker can make predictable changes to
the data by exchanging or repeating blocks.
To overcome these problems three other modes exist: CBC, CFB and OFB. PGP uses CFB
mode, so I will discuss that one. CFB stands for Cipher Feedback Mode. The CFB-mode used by
PGP is designed by PRZ, and differs slightly from the standard CFB-mode. It is described in [3].
CFB
The CFB algorithm uses some extra storage, named a feedback register, whose size is equal to
the block size of the cipher used (called a block of data. Often 8 bytes). To encrypt a block, it
is simply XOR-ed with the contents of the register. If the ’block’ you want to encrypt is not a
complete block, but is shorter than the length of the feedback shift register, you can take the first
bytes of the register. After using (part of) the register it is filled with the encryption of the last
block of the output. If only a part has been used and it is not the end of the input, this is called a
sync. Let Ci be the ith block of encrypted text or ciphertext, R the register, and Pi the ith block
of plaintext.
Ci = Pi ⊕ R; R = E(Ci ) (3.2)
To start encrypting, the register is filled with the encryption of the allzero block. Then an
initial value (IV) consisting of one block of random bytes is encrypted in CFB mode: the output is
discarded, but the register is now set. Then the first two bytes of the random block are encrypted
again. This forces a sync. Now the data can be encrypted.
The non-standard part of PGP’s CFB is that the registers initial value is not simply set, but
obtained by processing random data, and the fact that it is allowed to process shorter incomplete
blocks at any time: When encrypting multiple big numbers, a sync is done after each number.
These small peculiarities do not affect the security. The repeatal of the first two random bytes
provide some sort of checksum: it allows you to detect that you are using the wrong key. A nice
property of this mode is that you do not have to pad the last bytes to make a full block.
A severe weakness of CFB is that one can make predictable changes to the last block of bytes: If
you flip bits in the last part of the encrypted data, the corresponding bits of the decrypted plaintext
also have changed. These changes will often be detected by PGP if the encrypted data is somehow
signed. Sometimes the data is not signed: If you used PGP to password encrypt a file, it is not
completely protected against changes. A secret keyring is also encrypted, but not signed. This is
a severe problem because all cipher modes are designed to protect against changes, and it is a pity
that this protection fails for the last block. The Klima-Rosa attack on page 66 exploits this.
A more theoretical disadvantage is that CFB is vulnerable for a chosen ciphertext attack de-
scribed in [11]. In a chosen plaintext attack the attacker can have the decryption of any block of
data, except the one it is attacking. This is quite a strong assumption, but in certain implemen-
tations it could be the case. The attacker can send in the original message, with arbitrary bytes
instead of the last block, and from that decryption he can determine what the last block must be
Xored with. This attack is not a serious threat for PGP, but could be interesting in other uses of
CFB.
36 CHAPTER 3. ALGORITHMS IN PGP
CBC
Another common mode is CBC or Cipher Block Chaining mode. PGP does not use it for encryp-
tion, but it is used inside the random number generator (see page 77). It also uses a register R. R
always contains the last output. A plaintext block is encrypted by Xoring it first with R, and then
encrypting it.
Ci = E(Pi ⊕ R); R = Ci (3.3)
This mode does not allow using a partial block at the end of the message. Any change made in the
ciphertext will corrupt an entire block.
3.3.1 RSA
RSA is the most famous public key algorithm. It was used in the first version of PGP. RSA is based
on number theory: It uses large prime numbers. It was invented by Ronald Rivest, Adi Shamir
and Leonard Adleman, who first published the algorithm in April, 1977. This algorithm became
popular despite patent problems.
Key generation
One always makes a key of a certain size. A bigger key will be slower, but it will also be safer against
a factoring attack. There is an ongoing improvement in factoring techniques, so the recommended
3.3. PUBLIC KEY ALGORITHMS 37
keysize is slightly increasing. At the current moment it is 1024 bits. The size of a key is the size of
the number n expressed in bits. The user chooses at random
• p a large prime number, half the size of n (say 512 bits). p is chosen as one of the the first
primes next to a randomly chosen number, and starting with the bits 11, to ensure the size
of n. It chooses not always the first prime number, but shuffles the 256 elements above the
starting point so that primes above a large primeless gap do not have a too big chance of
being chosen.
• q another prime number (again 512 bits). In PGP q is chosen not too close to p. q is larger
than p (if not swap the numbers). It also starts with the bits 11.
• e any convenient number relatively prime to p − 1 and to q − 1. Small numbers with low
Hamming weight are faster to calculate with, so common values are 3, 17 and 65537. 3 is a
very risky value, see the section on low exponent attacks, so the new versions use 65537. If
the value of e you want is not relatively prime to p − 1 and q − 1, choose new values for p
and q.
The user now calculates
• n = p ∗ q. The size of n is the size of the key. It will be the size of p plus the size of q, because
they both start with 11 in PGP.
• φ(n) = (p − 1) ∗ (q − 1). This means that mφ(n) mod n = m, for every m. If you check
the sourcecode of PGP, you will see that they instead of (p − 1) ∗ (q − 1) they use φ(n) =
lcm(p − 1, q − 1). This is slightly faster, because it is a smaller number, but it will give the
same d, so it does not matter.
• d = e−1 mod φ(n).
• u = p−1 (mod q)
The public key consists of (n, e). The private key is (n, e, d, p, q, u).
Usage
• encryption of a short message x: Convert x to a large number m, 1 < m < n and calculate
the encrypted text C = me (mod n).
• signing a hash value h: convert h to a large number m, 1 < m < n and calculate S = md
(mod n).
• decryption: calculate m = C d mod n.
• signature verification: check that m = S e (mod n).
The calculations like S = md mod n can be done faster, if you possess the private key, by first
calculating s1 = md mod p and s2 = md mod q. This is faster because p and q are smaller than n.
Now we have S = s1 + p ∗ u ∗ (s2 − s1 ) mod n. In the literature this is called an application of the
Chinese Remainder Theorem.
This description of RSA is a little longer than the one in most textbooks, because it incorporates
certain details from inside PGP (like the extra variables in the private key).
There are many ways to convert messages x to large numbers. Not all of these are equally good.
The method used by PGP is described in section 3.3.4.
38 CHAPTER 3. ALGORITHMS IN PGP
pair of squares.
The QS runs subexponential and a variation of it factored a 428 bit number in 1994. The
Number Field Sieve is an even faster algorithm.
c1 = m3 mod n1 (3.4)
c2 = m3 mod n2 (3.5)
3.3. PUBLIC KEY ALGORITHMS 39
c3 = m3 mod n3 (3.6)
To prevent this attack never encrypt the same number twice. Always include some fresh random
bytes into every operation. This attack also works against other low values of e, but the attackers
always needs e encryptions to e different people to apply this method, so this attack is harder to
do if e = 65537.
Key generation
• A prime p is chosen. The size of p determines the size of the key. A typical size is 1024 bits.
• Find a generator g of a large enough subgroup of prime order (g is said to generate the set of
numbers gi mod p). Large enough means at least 160 bit. You are looking for a g, 1 < g < p,
with gq mod p = 1 for a prime number q. This is only possible for a q dividing p − 1.
• Find a secret number x. The size of x is chosen to be a small as it can be securely to speed
up computations. The choice PGP makes is based on a paper from Michael Wiener. For a
2048 bit key, the size of x is 225 bits. The complete table of x sizes is listed in pgpKeyMisc.c.
It should at least be 160 bits, according to the security assumptions in this paper.
• y is calculated: y = gx mod p
The public key is (p, g, y). The private key is (p, g, y, x). There is more than one way to do the
above. It is possible to search for p = 2 ∗ q + 1 with q also prime. In this case g = 2 will do. p is
called a Sophie Germain prime. This gives a very strong ElGamal key, but it takes a long time to
find such a prime, because they are rare. It is safe to share p and g with a group of users, so PGP
can use some precomputed values. This option is called fastgen. A third way is to take a random
prime q, a few bits shorter than the requested length of p (say 10 to 20 bits shorter). A few values
p−1
of k are tried to find a p such that p = 2 ∗ k ∗ q + 1. Now it is checked that 2 q is not 1. If this is
the case, g = 2 is a good choice, otherwise another value for k is tried.
40 CHAPTER 3. ALGORITHMS IN PGP
Encryption
m is the message padded in the PKCS way described in the section [10], to make it a positive
number smaller than p. The sender selects randomly a number k, 0 < k < p − 1. He computes
a = gk mod p and z = m ∗ y k mod p. The encrypted message is (a, z).
Decryption
m = (ax )−1 ∗ z mod p.
Security of ElGamal
This algorithm seems very secure. There are no ’low generator attacks’, so the only way of attacking
seems solving the Discrete Log problem. There is a close relation between solving the DL problem
and factoring. In fact one can say it is equally hard to solve the DL problem in Zp ∗ as factoring a
composite number the same size as p.
DL-algorithms can be divided in group-theoretic algorithms, which work on all the ElGamal
variants, even the ones on different groups, and the ones specific for Zp ∗ . A specific algorithm is
the index calculus, which is the DL variant of the quadratic sieve. It is described in [20].
p A group-theoretic algorithm is Shanks’ baby ystep -- giant step algorithm. Let m =
upperbound(x). You want to know x with x = g in a certain group. Write x = x1 ∗ t − x2 for
x1 , x2 ≤ m. It holds that y ∗ gx2 = (gt )x1 . Create a table T that holds for all values of x1 , (gt )x1 .
Next try calculating y ∗ gx2 for all values of x2 and look up the results in T. If you have a match,
you know x1 and x2 , and thus you know x. p
The running time of this algorithm is upperbound(x). This is the best a group-theoretic
algorithm can do, so PGP is safe from these attacks, because it would mean doing more than 280
steps.
If you want to solve the DL problem in Zp ∗ , and you know something about the factorisation
of the order of g, for instance ord(g) = a ∗ b ∗ c, you can solve the DL problem in a time depending
on the largest factor of ord(g). For this reason one should always be sure that ord(g) contains at
least one large prime factor. The algorithm to do this is called Pohlig-Hellman.
3.3.3 DSA
The Digital Signature Algorithm, part of the digital signature standard (DSS) , is a US government
standard for signing documents. It can only be used for signing. The main advantage over RSA
and ElGamal signature schemes is that it produces much smaller signatures. The DSS document,
[15] describing the DSA was published in 1994.
The DSA is a special variant of ElGamal, called subgroup-ElGamal. This version of ElGamal
needs a large group for doing the calculations in, 1024 bits, but the signature numbers are chosen
from a smaller subgroup (2160 elements), and the designers have succeeded in making a protocol
where only 160 bits are needed to denote these elements of the subgroup. This makes DSA a variant
of ElGamal that generates quite small signatures: It takes two 160 bit numbers instead of two 1024
bit numbers.
The DSA has been created by the National Institute of Standards and Technolgy (NIST), but
they got help from the NSA. The standard got much criticism, because people were getting used
to RSA, and now the government tried them to make them switch to DSA. It was also criticised
for just being a signature standard, not being an encryption standard. While it is probably true
3.3. PUBLIC KEY ALGORITHMS 41
that the NSA would not help people encrypting data, this does not make DSA a worse signature
standard.
The official key size of DSA varies from 512 to 1024 bit. This is not much: 512 bit is certainly
insufficient, and 1024 bit is the only reasonable value, so it will not last forever. The reasoning
behind the small key size is that powerful government adversaries, like the NSA will try to break
encryption, but not try to forge signatures: Once they start using this ability people will notice it
and it will not remain a secret that they can do it. So you need not fear governments but only less
powerful adversaries. Anyway, PGP always makes 1024 bits DSA keys, and this is secure enough.
Key generation
• Select the 160 bit prime number q
• Select the 1024 bit prime number p such that q|(p − 1).
p−1
• Find a and g such that g = a q mod p with g 6= 1.
• Compute y = gx mod p
Signature generation
• Select randomly a number k, 0 < k < q.
• Calculate s = (k2 ∗ (H(m) + x ∗ r)) mod q. H(m) is the hash of the message.
Signature verification
• Verify that 0 < r < q and 0 < s < q
• Calculate u2 = t ∗ r mod q
Note that unlike RSA signatures, you cannot recover the hash value from this signature. This is
caused by the fact that this is subgroup ElGamal: the numbers given contain enough information
to do a check, but not to reconstruct the input.
The signature does not contain information about the hash function used. The official stan-
dard specifies that SHA is used, but PGP specifies that any 160 bit hash function can be used.
Theoretically this could be a problem, if a weak hash algorithm is included in PGP (see section
4.4.1).
42 CHAPTER 3. ALGORITHMS IN PGP
3.3.4 Padding
Padding is the process of converting messages into the big numbers algorithms like RSA and
ElGamal need. Often the data must be expanded, because the message is shorter than the number
needed. This padding is vital to the security, because most public key algorithms are not immune
to attacks with specially chosen numbers.
The padding described here is specially designed for use with RSA. It is also used for ElGamal.
It is described in [10]. The document appears in the PKCS series, a series of Public Key Cryptography
Standards maintained by RSA security. It is described in document # 1, and the version of this
document used is version 1.5. This documnet is also a RFC, number 2313 [10]. There is a newer
version, RFC 2437.
A string of the right number of bytes is formed (this is size(n)/8, rounded upwards) by con-
catenating the parts 00, BT, PS, 00, M. Hexadecimal notation is assumed, so 00 means a null byte.
The parts have the following meaning:
• PS is the padding string. It has variable length, and consists of many FF bytes if BT is
01, or random nonzero bytes if BT is 02. In the latter case they must be really random not
pseudorandom.
• M is the message you want to include. In PGP, it is the session key appended with a
2 byte checksum, or the hash with a certain prefix to identify the hash algorith(see page
pagerefhashprefix). This checksum is calculated by adding all message bytes modulo 65536.
Applications should check the checksum, and the FF’s in the padding string. The reason the
encryption has random bytes and the signing fixed bytes, is that when encrypting, you want to
prevent that attackers guess the number you encrypted, while this does not matter while signing.
For signing, you want to prevent that people can verify random numbers until something comes
out that looks like a signature: An attacker might want to do this to forge a signature. He cannot
show the document in this case, but in some applications even this kind of forgery is bad.
One small potential weakness is that the signing operation contains no random padding. If
something goes wrong during a signing operation, the attackers exactly know what the input was,
and can compare the numbers to obtain information on the secret key. I recommend inserting 10
random bytes in the message when using this kind of padding in a PK algorithm like RSA, where
the message can be recovered from the signature. It can help avoid the attack on page 37
To hash a certain input, the input is padded with zeroes to make it 64 bits short of a multiple
of 512 bits. Then the total length is appended as a 64 bit number. Four 32 bit variables A, B, C
and D are initialized with special numbers defined in the standard, and each 512-bit input block
and the current values A B C and D are fed into the compression function, and the result, 128 bits
large, is split into 4 32 bit numbers, and those numbers are added to A, B, C and D respectively.
If all blocks are processed, A, B, C and D form the output.
In the key size section (page 12) it is explained that 128 bits is not enough for a hash function
against the birthday attack described below. For MD5 things are even worse: Cryptanalysists have
specific attacks against MD5 that show a weakness in the compression function.
Rivest intended to design a collision-free compression function for MD5. He did not succeed.
Hans Dobbertin was able to construct a collision in the compression function (see [5]). It took 10
hours of calculating time on one Pentium 90 computer. A collision for the compressing function
is a triple (ABCD, x, y) such that
It is not a collision of MD5 itself because one does not have a document prefix that will bring
ABCD to the required value. This collision is therefore not an immediate threat for PGP.
The short report in which the collision is presented does not show how the collision has been
calculated. I presume some MD5-specific techniques are used, in a very clever way. Nevertheless I
would like to point out one can expect that finding a collision in the compression function is easier
than finding a collision in the hash function.
If you want to find 1 document having a particular hash, you are looking for a 2−128 event ,
while you can choose one variable: the document. This will take 2127 time steps on average.
If you are looking for two documents x and y with the same hash, you have two variables (x
and y), and by just starting to hash random documents you will find such pair in only 264 steps,
because you have two points floating around in the 128 bit search space instead of one. That is a
lot faster.
For finding a compression function collision one has three variables: x, y and the state ABCD.
If, and this is a big if, you know some way to make use of the extra freedom offered by ABCD, it
seems that you only need 2128/3 = 243 hashes. If a Pentium would do one million compressions per
second, one can make 238 steps in 10 hours. The difference between these two is just a factor 32.
The reasoning above is highly hypothetic, and does not tell you how to search. I just want to
point out that it is not a complete surprise that some-one found such a collision as Dobbertin did.
A 128 bit hash is simply too small, considering the speed of today’s computers. MD5 is no longer
recommended, and one of the weak points of PGP. Although no one has broken it in practice, one
can expect it will happen.
3.4.2 SHA
The Secure Hash Algorithm is designed by NIST and the NSA. Its design is based on MD4, just
like MD5. SHA produces a 160 bit hash. SHA is used in the DSS standard. It is defined in the
Secure Hash Standard, FIPS PUB 180[16].
The SHA version that anyone uses is SHA-1. SHA-1 is a modified version of the original SHA.
Just a few details were changed, presumably to make it stronger against certain attacks. PGP also
uses the modified version SHA-1.
44 CHAPTER 3. ALGORITHMS IN PGP
3.4.3 RIPEMD160
RIPEMD-160 is a 160-bit hash function, designed by Hans Dobbertin, Antoon Bosselaers, and
Bart Preneel. It is intended to be used as a secure replacement for the 128-bit hash function
RIPEMD. RIPE stands for an EU project named RACE Integrity Primitives Evaluation, 1988-
1992. It was originally published in [4].
3.4.4 Security
Birthday attack
The best attack that works against any hash function is the birthday attack. This attack is based
on the remarkable fact that it only takes a group of 23 people on average to find two people with
the same birthday. In this attack Eve needs a signature of Bob under a document that Bob would
never sign. To do this, she lets Bob sign the hash of an innocent document. Then she distributes
the signature together with the evil document, and if it has the same hash, it will convince people
that Bob signed the evil document.
Eve generates a whole series of almost identical innocent documents, and stores all the hashes.
The documents differ in very small details: extra spaces, insignificant words changed, spelling
errors, etc. She also creates a series of almost identical evil documents, and tests if the hashes
match any of the innocent hashes stored. She makes both series L = 2Sreal (H)/2 hashes long.
Now the birthday effect will take place. The probability that a certain evil message does not
2
collide with any innocent one is L L−L 2 = L−1
L . Quite big. However, all evil messages are generated
independently after another, so the probability that none of the evil messages collides with a good
L
one is L−1
L ∼ 0.36 2 . It is thus fairly likely that you have found a collision. If not, try again. This
attack shows that Sef f (H) ≤ Sreal (H)/2.
Based on the 80 bit security definition and the birthday attack, a fingerprint must be at least
160 bits long. Both RIPEMD160 and SHA-1 have this length. There are no specific attacks known
that weaken their security, so both are considered safe. In fact they are both quite similar, so I
cannot say which is better.
2
try a few values to see this, for instance L=100, 200,. . .
3.5. SECRET SHARING ALGORITHMS 45
3.5.1 Problems
There is nothing wrong with the algorithm described above. Assuming that the random number
generator is good, it gives perfect secrecy: Without enough shares, one can say nothing about
the value of the secret: It can still become anything, depending on the missing shares. This is
probably the reason that there is only one secret sharing algorithm in PGP.
The only problem with this kind of secret sharing is that it is used to split private keys. A key
split that way must be reconstructed before use: If the shareholders want to sign something, they
must get together, and put all their shares, and their pass phrases into one computer. This poses
a serious security risk: They must carefully check that the information is later destroyed. If they
decide to do the reconstruction on the computer of one of the shareholders, he can easily modify
his PGP version to sneakily store the shares and pass phrases for later use.
Better schemes exist where a key can be split in parts, and used without having to reconstruct
the key. For RSA and m = n, a simple scheme is to find shares S1 , S2 , . . . , Sn with S1 +S2 +. . .+Sn =
e mod φ(n). To calculate X = M e mod φ(n) all players do Xi = M Si mod φ(n) and they all
together reconstruct X = X1 ∗ X2 ∗ . . . ∗ Xn mod φ(n).
More advanced schemes exist to create threshold schemes, but the idea is the same: The
algorithm used in PGP is only suited for secret data sharing. For secret key sharing one should
use a PK-algorithm specific solution that allows one to use the shares without reconstructing the
secret.
46 CHAPTER 3. ALGORITHMS IN PGP
Chapter 4
A file format is a set of rules that describe how to build files, what is allowed and what not, and
if it is allowed, what it means. The format for all files used and produced by PGP is described in
a document called OpenPGP Message Format [3]. The name OpenPGP refers to the file format
standard that PGP conforms to, and that can also be used by other programs. The document is
an internet RFC: Request For Comments. The RFC series is a series of public electronic documents
giving information on internet related technology. They can be found at the site www.ietf.org. The
documents have no official status, they are just request for other people to comment on, but in
practice they can be considered the ultimate authority on the subject.
This file format is used for several kinds of files. For PGP-encrypted messages, (.pgp files), for
standalone signatures (.sig files), for exported keys (.asc files1 ) and for the public keyring file(.pkr)
and the private keyring (.skr). The last two files, the public keyring and the private keyring file,
are two files that are created during the installation of PGP. Their default name is pubring.pkr
and secring.skr. The secret keyring contains all private keys, the pubring the public keys and
the trust information. These files are internal for PGP, so it is not immediately clear why they
should conform to a standard. The reason they do is that one can upgrade the PGP installation
without losing your old keys, and that OpenPGP was written afterwards: it was not an entirely
new standard, but a formalization of the status quo, with some new features.
The file format is interesting for two reasons: First of all, it could be that the file format
specification contains a design error that leaks information or makes them easy to manipulate.
Secondly, by comparing what should be produced and accepted by what PGP actually creates and
accepts, one can check whether PGP works.
4.1 Basics
A file is a finite length sequence of bytes. A byte is 8 bits of information and has a numerical
value between 0(included) and 256(excluded). The OpenPGP standard uses the word octet instead
of byte. Bits can have the value 0 or 1. The first bit of a byte, bit 0, is the most significant one,
bit 7 the least significant. The default encoding in OpenPGP is big-Endian. That means that if a
number is encoded using multiple bytes, the first byte is the most significant one.
The value of a byte is given as a decimal number, unless it appears grouped into two characters
at a time: In that case it is a hexadecimal value. For instance, 70 on itself is ambiguous, 00 70 is
1
.asc can also be used for other (ascii-armored) files
47
48 CHAPTER 4. THE PGP FILE FORMAT
4.2 Packets
The data in a file is divided in packets. Each packet consists of the data it contains, prefixed with
a header indicating what kind of packet it is, and the length of the packet. The packet kinds are:
2 Signature Packet
10 Marker Packet
12 Trust Packet
13 User ID Packet
0 The header is two bytes, and the next byte contains the length. This length type can be used
for packets where the data is less than 256 bytes.
1 The header is 3 bytes long, and the next two bytes encode the length. This length type can be
used for data with a size less than 65536 bytes.
2 The header is 5 bytes long, and the next four bytes encode the length. This length type can be
used for data with a size less than 4 gigabytes.
3 The packet length cannot be determined from the header The header is exactly one byte. The
length of the packet must be determined ’from the context’.
For new packets, the second bit is 1, and the remaining 6 bits of the first byte of the header
indicate the packet type. The header can be 2, 3, 5 bytes long. Which length is appropriate depends
on the value of the second byte of the header.
192-224 The data has length (byte1 − 192) ∗ 28 + byte2 . (Remember that byte1 is the second byte).
This can be used if the data size is in the range 192 to 8384.
224-255 This indicates that the length of the data is not completely known yet. The length of the
first part of the data is 2byte1 −224 . After this first part a new length indicator(header without
first byte) must follow, and the rest of the file(possibly using this length type repeatedly).
The last part must have a different length type. This length specification is called partial
body length.
255 This indicates that the length is encoded in the next four bytes. This length type can be used
for any data size less than 4 Gb.
An important point of this design is that one can recognize the type of packets. This means
that PGP does not need to know whether it is reading a key or a signature. I have tried to confuse
PGP by feeding it key files that have the extension ”.sig”. It turns out that PGP completely relies
on the packet types. All other information is ignored. Another good point of this encoding is that
PGP knows in advance the length for the upcoming data (most of the times). This allows PGP to
create buffers of exactly the right size. In cases where programs need to read until the end of the
line, hackers often try to make huge lines so that their data is copied to a location that is beyond
the end of the buffer.
4.3.2 Version 3
A version 3 public key packet consist of:
4.3.3 Version 4
04 (=version indicator), [time, 4 bytes] , [PK algorithm, 1 byte], [big numbers]
For RSA, the big numbers are (n, e). For DSA it is (p, q, g, y). For ElGamal it’s (p, g, y). There
is no information about the expiration of this key. That information must be stored in signatures
attached to the key. The expiration date in the self signature can be seen as the key expiration
date, but note that in this new format the signer can re-sign his key to extend its lifetime without
losing the signatures by other people.
[All fields of the corresponding public key], [1 byte setting the string to key procedure],
[optional: a symmetric encryption algorithm, 1 byte], [optional: a string to key specifier],
[if encrypted: 8 byte initial value], [encrypted big numbers], [2 byte checksum]
The big numbers are (d, p, q, u) for RSA, and x for ElGamal or DSA. The [1 byte setting the string
to key procedure] can be a symmetric key algorithm specifier. In that case the pass phrase must
be converted to a key by using MD5, and the optional fields are not present. If the byte has the
value 255, the optional fields are present. The given block cipher is used, and the given string to
key specifier is applied. If the byte is 0, the secret fields are not encrypted, and there is no initial
value present.
2
see page 62 for the actual values of algorithm specifiers
4.4. SIGNATURE PACKETS 51
For version 4 keys, the bytes of all secret numbers and the checksum are encrypted using CFB,
with the given initial value. The checksum is the sum modulo 65536 of the bytes of the secret
numbers. The checksum is used to determine whether the passphrase is correct.
For version 3 keys, the first two bytes of each secret number are not encrypted (the first two
bytes contain the length of the number). The rest of the secret number data is encrypted using
CFB (see page 35). A sync is done at the start of the next secret number. The checksum is also
stored unencrypted, and is calculated the same way as in V4.
The reason version 3 keys do not have their length encrypted, is probably that it is convenient
for programmers to know the length beforehand. The reason the V4 format does encrypt it, is
that this information might help an attacker. The OpenPGP authors disrecommend using the old
format unless you are forced to do for compatibility reasons.
4.3.5 Subkeys
In the first version of PGP all users had one key, a RSA key, that was used for both encryption
and signing. Since PGP 5.0 things can be different: Every user has one signing key. This key has
a couple of user ID’s attached, and a few subkeys. A subkey is used for encryption, and can be
revoked independently from his super key. It is signed by its superkey. If the police ever forces you
to hand over your key because they read your email, you can give them the relevant subkey instead
of a key they can also sign with.
00 : Signature on a binary document: The signer asserts that he owns, made or saw the file, or
that it was not modified.
01 : Signature on a text document. The difference between this type and the first is that this
signature is made on a file with the line endings converted to [carriage return][line feed] and
trailing spaces removed. This is useful because it means that the signature remains valid if
the document is sent to a different computer platform with different line ending rules.
02 : Standalone signature: this signature only signs its subpackets. This is useful for version 4
signatures because they can contain extra information.
10 : Generic certification of a public key and a user ID. It does not state how thorough the signer
checked that it is correct.
11,12,13 : These stand respectively for persona, casual and positive verification of the user ID and
key. They do state something about the effort the signer did to ensure it is right: persona
verification means that the signer did not do any verification, casual stands for some casual
verification and positive stands for substantial verification. This is all very vague. According
to the OpenPGP standard, this vagueness is not a flaw, but a feature of the system, but
52 CHAPTER 4. THE PGP FILE FORMAT
I do not see the use of fine grained claims with no precise meaning. All PGP certification
signatures so far are of type 10.
18 : Subkey binding signature. It states that this subkey belongs to the topkey. It is calculated
on the subkey itself, not on any user ID.
1F : Signature on a key. It is useful if the signature contains subpackets that give information on
the key. For those to be valid, the signature must be made by the key itself. Other people
can use this signature to make statements (using subpackets) about the key.
20 : Key revocation signature. The signature is calculated on a public key that is being revoked.
It can be made by the key itself, or a key designated as a revocation key.
30 : Certification revocation signature. This signature can revoke a certification signature of type
10-13. It can be made by the key of the signature, or a revocation key of that key.
40 : A timestamp signature. This signature only states the time that is in it.
4.4.1 version 3
A version 3 signature consists of
RSA
The result is padded according to subsection 3.3.4. The prefixes are based on some sort of
formal encoding scheme, explained in [10].
The reason that there is an algorithm-specific prefix inside the RSA encoding is to prevent that
some-one lifts a signature to another document by changing the hash function. Suppose we have
two different hash functions, one strong and one weak. You always use the strong one when signing
your documents. I can find one of your signatures, change the hash function indicator to the weak
algorithm, and find an evil document with the same hash (this should not be to hard because we
are breaking a weak hash function). This is not possible if your signature contains information
about which hash function should be used: To do this attack on a classic RSA signature one must
also break RSA.
DSA
For DSA two big integers are added: r and s. The input of the signature algorithm is the output
of a hash function with 160 bit output. There is no further processing. This means that DSA
signatures are vulnerable for the hash-algorithm change attack described above. At the moment
none of the 160 bit hash functions in PGP are considered weak, but once one is broken it makes
all signatures by that algorithm invalid, even if they come from a trusted person.
4.4.2 Version 4
A version 4 signature consists of
04 (=version number), [signature type], [PK algorithm], [hash algorithm], [total length
of the following subpackets, encoded in two bytes], [a few subpackets that will be
hashed], [total length of the following subpackets, encoded in two bytes], [a few subpack-
ets that will not be hashed], [first two bytes of hash value], [big integers of signature].
The signed data is being hashed, and everything from the version number to and including the
hashed subpackets.
54 CHAPTER 4. THE PGP FILE FORMAT
2 signature creation time. 4 bytes. This must be present in the hashed subpacket session
3 signature expiration time. 4 bytes. If it is not present or has a value of zero, the signature does
not expire.
5 trust signature 1 byte indicates the level of trust in this key: 0 means valid, 1 means that the
owner of this key is a trusted introducer, 2 means it is a trusted introducer of introducers,
...
6 regular expression ending in 00. This regular expression limits the trust signature. It states
that the trust only applies to the user ID’s matching the expression. The regular expression
format is described fully in [3].
7 revocable 1 byte. 0 means not revocable, 1 means revocable. If not present, the signature is
revocable.
9 key expiration time 4 bytes. The number of seconds after which the signature expires. If 0, the
key does not expire. This can only be found on a self signature.
10 placeholder for backward compatibility. I do not know what it means, but I assume it can be
ignored.
11 preferred symmetric algorithms (block ciphers). This is only found on a self signature, and is
a sequence of magic numbers of block ciphers in decreasing order of preference.
12 revocation key. It consists a class (1 byte), algorithm id (1 byte) and a fingerprint. It means
that the key with the given fingerprint may revoke this key. It is useful to assign this if you
might forget the passphrase of the key, because then you cannot revoke it yourself. The class
byte must have the first bit set. If the second bit is set, this information is sensitive, and
should not be exported unless necessary. This subpacket is found on a self signature, and
must be in the hashed section.
20 notation data. Consists of 4 bytes of flags (first bit is set if it is humanreadable), length of
name N (2 bytes), length of value V (2 bytes), name(N bytes), value (V bytes). This can be
used to make any comment the signer wishes.
21 preferred hash algorithms. This is only found on a self signature, and is a sequence of magic
numbers of hash functions in decreasing order of preference.
23 key server preferences. This packet can only be found on a self signature, and consists of any
number of flag bytes that tell a key server how to handle this key. Only the first bit has a
meaning: if it is set it means: ’no modify’, so that only the key owner can modify this key on
the key server. I hope they add useful flags in the future, because at this moment I do not
see the why of this subpacket.
24 preferred key server One string with the URL of the preferred key server. Note that each user
ID may have its own preferred key server.
25 primary user id. One octet telling that this user ID is the primary user ID of this key.
26 policy URL. A string with a URL to a document that describes the policy that the signature
was made under.
27 key flags. A string of flags. Currently defined are: bit 7: may be used to certify other keys, bit
6: this key can sign data, bit 5: this key can encrypt communications, bit 4: this key can be
used to encrypt storage, bit 3: the private part of this key may be split, bit 1: the private
part of this key can be owned by more than one person.
More flags can be added in the future. Superfluous flags must be set to zero, absent flags are
considered zero. Some of these flags only make sense on a self signature.
28 signer’s user id. Using this subpacket one can indicate which user ID made the signature, for
instance to differentiate business- from personal signatures.
29 reason for revocation. Consists of revocation code (1 byte) and a string stating the reason of
revocation. The first byte can be 00(no reason specified), 01 (key is superseded), 02 (key has
been compromised) 03 key is no longer used, 20 (user ID is no longer valid). 20 may appear
only on certification signatures, 01, 02 and 03 on key signatures and 00 on both kinds.
key is present, it can be decoded with the pass phrase (CFB, allzero IV). If not, the passphrase
converted with the S2K gives the session key. In both cases, the result is a symmetric algorithm
specifier followed by the session key.
Marker packet(10)
This packet spells ’PGP’: 50 47 50. It can be placed at the beginning of a message that cannot be
read by PGP 2.6.x to make sure that that program does not try to handle what it cannot handle.
Trust packet(12)
The trust packet must only be used in keyrings. This packet contains the users specification of
which key holders are trustworthy introducers. OpenPGP does not describe how this packet is
formed.
4.6. KEY ID AND FINGERPRINT 57
User ID packet(13)
This packet contains one string. By convention, it contains a name followed by an email address
enclosed in <>.
Simple s2k This s2k consists of 00 [hash algorithm id, 1 byte]. The corresponding procedure is:
Apply the hash function on the pass phrase.
A weakness in this procedure is that a dictionary attack can be mounted: An attacker can
build a key database by hashing a large set of common and likely passphrases. Many people
can use this database many times, to try to guess the pass phrase of a key. This attack is
better than just trying passphrases because the hashing needs to be done only once.
Salted s2k A specifier for salted s2k is 01 [hash algorithm, 1 byte] [8 byte salt value]. This works
the same as simple s2k, but instead of hashing the pass phrase, the concatenation of the salt
and the pass phrase are used. This approach makes it impossible to do a dictionary attack:
The attacker needs a database built with the right salt value. Because this value is chosen
different all the time the attacker cannot use such database often.
58 CHAPTER 4. THE PGP FILE FORMAT
iterated and salted s2k This specifier looks like 03 [hash algorithm] [8 byte salt value] [1 byte
count value c]. The value c specifies a number of counts, using the formula count = (16 +
(c mod 16)) ∗ 2c/16+6 . Count specifies the total number of bytes that will be hashed. To get
these bytes, the salt and pass phrase are hashed repeatedly. In the unlikely case that a low
count is specified, such that the count is less than the salt+passphrase, the salt and pass
phrase are hashed entirely.
This count can be used if you are encrypting and decrypting on a fast computer, to frustrate
attackers. If you use a slow procedure, they are also forced to do the extra calculations when
they do a passphrase guessing attack. The theoretical security does not improve, because
you have no advantage over the attacker: both of you must do more work. The standard
recommends that the iterated and salted s2k is used. In practice, it can really frustrate a brute
force attacker. OpenPGP recommends using this string to key specifier. There is no number
of counts specified. If you know that both the encrypting and the decrypting computer have
plenty of time, and that the pass phrase might be guessable, use c = 255. If you are PGPing
with a webserver that needs to do a lot of crypto, you can use the salted s2k instead, because
otherwise you would seriously drag down server performance.
If the output of the hash function is too short to be used as a key, the procedure is repeated,
but with the value 00 prefixed to the input of the hash function. If another output is needed, 00
00 is prefixed, . . . . In practice, a 128 bit key must be produced from a 160 bit hash value, so this
is not relevant, but if you combine the AES with MD5 this applies.
4.7.1 Problems
The string to key conversions seems to be dealt with properly. I see no problems. The use of a hash
function assures that the key contains uniformly distributed bytes, thus assuring that no weak key
is selected. The salting and iterating are designed against a dictionary attack. The only protection
against a dictionary attack is a good pass phrase (long and random). If the pass phrase is not good,
the disclosure of your key is a matter of time, and the above measure will only buy you time. It
does not give the user any advantage on the attacker.
4.8.1 Keys
To export a key, concatenate the following: A public key packet or a secret key packet, maybe a few
revocation signatures, one or more user ID packets. Each user ID packet must have one or more
signatures directly behind it. The first signature is the self signature: it is made by the key itself.
It shows that the creator of the user ID indeed had access to the key, and in the new format it can
state extra information about the key, like the expiration date. The other signatures are made by
4.9. ASCII ARMOR 59
other keys. After the user ID’s subpackets the subkeys can follow. Each subkey must be directly
followed by a (subkey binding) signature, and maybe a few revocation signatures.
4.8.2 Messages
To PGP a message, quite a few packets must be combined. I hope you understand the following
grammar: a | b denotes a choice: a or b. a , b means a followed by b. a, b | c means (a,b)|c.
OpenPGP Message:- Encrypted Message | Signed Message | Compressed Message | Literal Mes-
sage.
ESK:- Public Key Encrypted Session Key Packet | Symmetric-Key Encrypted Session Key Packet.
Any characters not used by the encoding, like spaces and returns, can be inserted in the data
and will be ignored. For Header generation, the data is split in lines with at most 76 characters,
and header and footer lines are added.
Possible header lines are.
After this line zero or more key value pairs may be printed, to give extra information. Currently
defined keys are
MessageID: [used to find all parts of a split message. A 32 character string of printable characters]
Hash: [a comma-separated list of the hash algorithms used. For clearsigned messages]
Charset: [the character set used. To indicate a nonstandard character set(like Japanese)]
After the key-value pairs an empty line follows, and then the encoded data, and an end line. An
endline is formed by replacing the word BEGIN with the word END. One exception is that a —–
END PGP SIGNED MESSAGE—– is not used if a message and a signature are in the same file.
The message ends where the signature starts.
A file can be split into multiple files, in case the mail program poses a maximum to the permitted
message length. The PART X or PART X/Y header must then be used, with X the number of the
part, and Y the total number of parts. Parts of the same message must have the same message-ID.
If you only want to sign a message, and not encrypt, you do not need to re-encode the message.
You can put the message in the clear, and then the signature ascii-armored, like this:
Hi Bob,
I love you.
XXX
Alice
-----BEGIN PGP SIGNATURE-----
Version: PGP 7.0
iQCVAwUBO2V7Pyp3hQindTDDAQEgOAP/RhGnALKArLCLDoBLJidkETkQQSFFO/t4
noDX7fNKpMoQGyDkRPI22bqhTp/NCQEPh5wmrmk/UzP0wNOPJXW/yUVIaUgZWDWN
OEeTDkS5/ifsir0mzSyR+HDzt2mo6OzPpPuStG3AGhVkHiEwGKjagEB37Gau/40e
H5fupJo0jg0=
=Oz2J
-----END PGP SIGNATURE-----
4.10. CONSTANTS 61
4.10 Constants
The following tables give the constant values that are used within OpenPGP.
62 CHAPTER 4. THE PGP FILE FORMAT
Block ciphers
0 Plaintext or unencrypted data
1 IDEA
2 Triple-DES (DES-EDE,168 bit key derived from 192)
3 CAST5 (128 bit key)
4 Blowfish (128 bit key, 16 rounds)
5 SAFER-SK128 (13 rounds)
6 Reserved for DES/SK
7 Reserved for AES with 128-bit key
8 Reserved for AES with 192-bit key
9 Reserved for AES with 256-bit key
100 to 111 Private/Experimental algorithm
Hash algorithms
1 MD5 ”MD5”
2 SHA-1 ”SHA1”
3 RIPE-MD/160 ”RIPEMD160”
4 Reserved for double-width SHA (experimental)
5 MD2 ”MD2”
6 Reserved for TIGER/192 ”TIGER192”
7 Reserved for HAVAL (5 pass, 160-bit) ”HAVAL-5-160”
100 to 111 Private/Experimental algorithm
Chapter 5
Bugs are errors in computer programs. All large computer programs contain lots of them, but
most bugs are not very dangerous and live on undetected until they are replaced by new code with
different bugs. Some bugs are found during tests done by the maker or vendor of the software. In
this case, the maker fixes the bug in the next version, and if it is serious it writes a ’patch’(program
part that can fix installed versions) for current users, and distributes the patch to the official users.
Some software makers also publish the bug on the internet, others do not.
Sometimes people find bugs in programs they did not write. If it is a bug that affects computer
security, the following procedure is often followed: The vendor is informed, and gets a reasonable
time (2 to 4 weeks) to fix the bug and inform its customers. Then the discoverer publishes the bug
on the internet. There are special newsgroups and mailing lists for discussing bugs. One website for
computer security is www.securityfocus.com, and they maintain the bugtraq database of computer
bugs and the bugtraq mailing list.
Often vendors do not react properly the first time they are informed. They do not feel like
fixing the bug, deny the bug, or the message got lost because no one in the company feels it is
his task answering the mail. The discoverer will wait a ’reasonable’ time (some hackers feel that
20 minutes is reasonable) and still publish the bug. This will greatly help evil minded people in
attacking systems.
After the publication of the bug it is often fixed in the next version and can be patched. This
does not mean the bug is history: not everyone installs all patches and uses the newest version of
all its software. The bug or similar bugs can also be present in other software products. In 1999
the format string bug was discovered: It turned out that a very common function in the language
C, sprintf, had a very unusual feature: It can overwrite arbitrary memory locations with arbitrary
values. This ’feature’ was available more than 20 years, but not all programmers knew about it.
Thousands of programs have this bug, and every now and then a new instance of this bug makes it
to bugtraq. In 2001, GPG turned out to have a format string bug. Other bugs also tend to return,
and therefore I have tried to give an overview of security bugs that were discovered in the past in
PGP, and I have tried to check whether they have been fixed.
63
64 CHAPTER 5. BUGS AND ATTACKS
The problem is that read does not return the random byte: it returns the number of bytes read,
which is 1 because count is 1. the line should have been:
This problem was found by Germano Caronni on May 23 2000. It had already been solved without
being discovered in the newer versions.
This a major bug, because it makes the keys that were generated non-interactively with this
version completely worthless (note that it is disrecommended in the manual to generate keys non-
interactively). A large part of PGP is devoted to securely generating random numbers. They have
a very sophisticated randomness administration, and they hash all random data at least twice. If
you do this kind of work, and it turns out that it does not do anything, it is a devastating bug. At
the same time this bug is not interesting from a cryptographic viewpoint.
He discovered that PGP does not complain about encountering an ADK in this area, although
PGP never puts them there and it is not allowed in OpenPGP. This is thus a bug in certain
implementations, not a flaw in OpenPGP. An attacker could obtain a victim’s public key, add his
own key as ADK, and then distribute the key as often as possible. People using this modified
key would normally (it is considered polite, because it seems the victim asked for it) encrypt the
messages with the modified key in such a way that the owner of the ADK can also read the message.
The vulnerable versions are 5.5.x to 6.5.3. It is fixed in version 6.5.8. This bug can be described
in two ways: As a design specification error, because apparently the designers of PGP overlooked
this unusual attack, or as a typo, because all that was needed to correct the bug was adding one
if-statement. The bug can be found in the function ringKeyAdditionalRecipientRequestKey in the
file pgpsrc\libs\pgpcdk\priv\keys\keys\pgpRngPub.c. In this file there is the statement
The essential variable is hashed. The function ringKeyFindSubpacket will set it to true if it found
the ADK packet in the part of the key that is protected by the signatures. In 6.5.1 this variable
was ignored:
Besides the above fix, all PGP software, including the key server program, was modified to delete
any ill-placed ADK packet. This makes mounting an ADK attack very hard, since the extra ADK
packet will not be distributed by the key server or by users with new versions of PGP.
Ralph Senderek placed his report[19] on his website, together with a reply made by Zimmer-
mann. Both are worth reading. Senderek has some very radical ideas about PGP: He believes
version 2.6.x is the best version of PGP, and that the new versions are too complex to be secure.
He also states that it is not possible to do a source code analysis for a program of that size, and
that ADK is a bad idea. Instead of source code analysis he proposes experimenting like one would
do in physics.
No one has ever found illplaced ADK packets in the wild. A similar attack, with less harmful
results, can be done by adding a revocation certificate in the unhashed area. This could lead to
a denial of service attack. By doing this attack the attackers do not gain anything, but the
defenders do loose a lot. If the ADK-bug was not discovered, some-one could revoke keys, which
66 CHAPTER 5. BUGS AND ATTACKS
would lead to a lot of confusion. Experiments by Martijn Stam1 prove that this can no longer be
done.
1. Break into the victim’s office, start his computer, copy his private keyring to a floppy disk,
and make certain clever changes to his private key(s).
2. Wait until the victim comes back and makes a signature with its private key. (You can
facilitate this with some social engineering: ”You have won our Britney Spears Lottery. Send
us this filled in and signed coupon and we will send you a billboard size poster.”). In normal
circumstances it is completely safe to make and publish signatures, so there is no reason why
the victim would not do this.
3. Compute the secret key using the signature. This requires a little math depending on the
kind of key (RSA or Diffie-Hellman, new or old format).
4. Break in again and restore the original private keyring. This is necessary if you do not want
the victim to find out his key is compromised. With the modified key you cannot decrypt
properly.
The first attack in the paper is on RSA private keys. These keys have the following structure
(described in detail on page 50):
• secretdata: d, p, q, u, checksum
All big numbers (n, e, d, p, q and u) are stored in the following format: first the length as a 16bit
number, than the content bytes to contain the data in most significant byte first order. In the
version 3 format, compatible with PGP 2.6.x, only the content of the numbers is encrypted, not
the length. In the version 4 format (supported since PGP 5.0)(see page 50), the entire secretdata
section is encrypted as one large chunk. The twobyte checksum consists of the sum of all secretdata
bytes modulo 65536. What the attacker tries to do is change the value of u. To attack the version
3 format, he lowers the length of u with 1, and also lowers the checksum. The checksum is still
correct.
When the victim tries to do a calculation with the corrupted key, he will construct the final
output of the RSA operation in the following way (described on page 37):
1
source is personal correspondence, results were not published
5.5. CZECH PRIVATE KEY ATTACK 67
DSA keys
To reach a complete knock-out, the Czech researchers also have a similar attack on DH/DSA
keys that actually works against PGP. The security of DSA is based upon the insolvability of the
discrete logarithm problem, and what the attackers do is modifying the public parameters, to make
the discrete logarithm problem easier. They change p to p′ = 167 ∗ 2151 + 1. There is nothing
particular about 167, except that it is a small prime, and that p′ is again prime and p′ < q. They
′ ′
also select a new generator g′ with g′((p −1)/2) mod p 6= 1 and g′((p −1)/167) 6= 1. This assures that g′
is again a generator of the group Zp′ ∗ . Now the attackers wait for a signature on a known message
m consisting of r ′ and s′ . It follows that
What the attacker does is retrieve k from the first equation. This is solving the discrete loga-
rithm problem, but this instance of the problem is solvable because of the special structure of p′ :
Pohlig-Hellman can be applied (see page 40). Then they use the last equation to find the secret
value x.
Several people, including Bruce Schneier and Phil Zimmermann, downplay this attack, because
this attack assumes it is possible to break into some-ones office. If you can break into some-ones
office, you have won the game: You can install all kinds of malicious software or install equipment
to eavesdrop the keyboard. These two options are much better than doing the Czech attack: In
case of DH, you do not get any decryption key, just a signing key, and this attack can be discovered
because the signatures made with the modified key are invalid.
68 CHAPTER 5. BUGS AND ATTACKS
These people are right, but one can think of special circumstances in which the Czech attack is
a good attack: If someone mails his private keyring to another computer, relying on the passphrase
protection. The paper that describes the attack, gives some valuable hints how PGP can be
modified to prevent this attack: add a good checksum, verify the public key parameters, and verify
any signature made. I think these are valuable hints that should be implemented as soon as possible.
On the other hand, Zimmermann’s reply is not entirely wrong. The PGP manual states that
people should guard their private keyring. PGP assumes the user controls his computer, and the
scenario for this attack violates this assumption. Therefore it is, in a strict sense, not PGP’s
problem. However, if they believe this, why do they store the secret key encrypted?
Zimmermann is also not very happy about the way the attack was announced: at a trade show
(Cebit) without warning NAI beforehand.
One can also apply the same technique to encrypted files: they neither contain a checksum.
This is important to know if you use PGP for conventional encryption: PGP only protects signed
files against changes. Any other file can be changed, and the bits in the last block of the file can
be changed in a predictable manner.
This attack shows what one can do with a little mathematics, knowledge of the file format, and
a very wide definition of the word ’attack’. I think it is a very interesting and even spectacular
result. The conclusions of the report are important for everyone writing cryptographic software:
Make all possible checks when reading key material.
Opening an ASCII armored file such as a public key or a detached signature can cause
the creation of an arbitrary file on the target machine. On the Windows platform this
can lead to the execution of arbitrary code on the target machine.
The problem is that the ascii-armor parser of PGP can create temporary files, and does not delete
these files if something unexpected happened in the input. It is possible to design a (malformed)
armored file that causes the armor parser to write any file to the hardisk if you doubleclick it.
This file will remain on the system. Anley enhances this attack by leaving behind a DLL file.
A DLL is a Dynamic Link Library that contains subroutines that can be called by programs.
All windows programs consist of .exe files and DLL’s. If the file that is left has a catchy name,
matching the name of an existing DLL, for instance a DLL used by the Acrobat reader, the code
in it could be executed any time the user reads a PDF document in the same directory. The
vulnerable versions are 5.0 to 7.0.4. NAI created a patch for this bug that will make sure that
PGP gives a warning when a file is remaining on the system. It also ensures that PGP uses the
right DLL’s and not DLL’s with the same name in the local directory. For more info on this bug
check http://www.atstake.com/research/advisories/2001/a040901-1.txt. The relevant source file is
discussed on page 75.
The example file Anley created is not ascii armored at all. This title is ascii armor bug
because it is the ascii armor parser that makes the mistake. His example file, called notes.doc.sig
contains one literal packet. It starts like this:
af 30 c4 70 67 70 73 63 2e 64 6c 6c 00 20 61 72 62 69 74 72 61 72 79 20 66 69 6c 65 6e
61 6d 65 00 a0 ee 01 3d de f1 05 4c a0 16 92 fa de cb 69 cf 8a 85 3f 84 0b 62 c9 c5 ed
5.7. MULTIPLE USER ID ATTACK 69
0d 16 35 d7 e2 21 b3 bd 52 a7 dc 56 89 36 d0 d1 4f 87 39 c4 2c 0d 2f 2e 2e 2f 2e 2e 2f
2e 2e 2f 2e 2e 2f 2e 2e 2f 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
4d 5a 90 00 03 . . .
In the first line, af, or 10 1011 11 states the packet type, (old format, type 11 decimally, two byte
length). 30 c4 states the length, and 70–6c 6c spells pgpsc.dll. The 00 ends this filename. The rest
of the field is simply more room for a longer filename, and is ignored by PGP. Chris Anley filled
it with ”arbitrary filename” and more spaces. The data starts at the 4d, and this byte and all the
next bytes are simply copied into the newly created file with the given name. The file pgpsc.dll is
a hostile replacement for a PGP file with the same name. It should verify all the signatures (even
the invalid ones) but at my system it just forces PGP to state an unintelligible error message. Note
that this file does not exactly conform to the OpenPGP standard for literal data packets, and that
is what confuses PGP.
The fact that PGP uses DLL’s that are in its current directory is also a bug in PGP, but not
a cryptographic one. It is a Windows weakness that is known to Microsoft, and Microsoft gives a
solution on its website. This attack shows that the security of PGP heavily depends on the security
of the underlying operating system.
• The second strategy is to use the most valid user ID. This makes sense: PGP simply ignores
the user ID’s it knows nothing about.
It turns out the PGP mixes these strategies in a very unlucky way, and one can confuse users
with this property. To exploit the confusion, a key with two user ID’s is needed. Below is a recipe
for making such keys. It describes what Eve must do to cheat on Bob. She needs a little assistance
from Carol. Carol is a close friend of Bob, and Carol’s key is listed as trusted in Bob’s keyring.
It is a simple procedure that can be done without diving into file format details. The file name
suggestions are there to make it easier to refer back to files. It works with any new PGP version:
5, 6 and 7.
• Eve generates a key with the name Alice on it, with Alice’s email address, or an address that
appears to be Alice’s.
• Eve adds her own name and email address as a second user ID to the key. (click on the key,
select keys/add/name)
• Eve exports the key including the private key somewhere on the computer, under the name
temp1.asc.
• Eve deletes the Alice user ID from the key, and exports the key, now having the name Eve as
its only name, without the private key, in the file tocarol.asc. Then Eve deletes her key from
her keyring.
• Eve personally goes to Carol with her public key file tocarol.asc. Carol sees that the key
indeed belongs to Eve, so she signs it (,makes the signature exportable), and gives the public
key with the signature back to Eve, so she can use the signature to convince other people
she is Eve (filename fromcarol.asc). By setting the signature Carol did not declare she trusts
Eve, she only declared that the key indeed belongs to Eve.
• Eve first imports temp1.asc. Then she imports fromcarol.asc. Now she again has one key,
with two user ID’s. The first one is Alice, so the key has Alice as primary name in the PGP
keyring view. The second user ID, Eve, has Carol’s signature.
• Eve exports this key under the name Alice.asc without the private key, and tries to give it to
Bob when he is looking for Alice’s key.
If Bob imports this key into PGP, and tries to verify a signature made with the key, he will see
something weird. Please check the screenshot. The dark validity lights are green on a computer
screen, and mean valid. The lighter ones are also gray on color screens and mean invalidity.
5.7. MULTIPLE USER ID ATTACK 71
In the PGPLog window (upper left), you can see that the signature is considered valid, and
the signature seems to be made by Alice. What happens is that the name field uses strategy 1,
and gives the first name of the key. The validity light in this window however uses strategy 2, and
reasons using the most valid user ID: the second. Bob now falsely believes that the file comes from
Alice. We have successfully forged a signature!
The other windows also show interesting things. The PGPKeys window shows exactly what it
should show: The first user ID is not valid, the second is, and if one takes Alice as the name of
the key, then the key is not valid. The properties window however is confused and/or confusing. It
shows the name Alice, but the Eve properties: It thinks the key is valid. It even allows raising the
trust bar. If Bob raises the trust bar, because he trusts Alice, the first user ID becomes also valid
(it is signed by a trusted key, namely itself) and Bob will start encrypting his secret messages to
Alice with Eve’s key. Eve can now read everything Bob sends to Alice! PGP normally does not
allow setting the trust of invalid keys, but here it does. So both the validity and the trust slider
are wrong (or one can say the properties window should show a different name. It is the mixing of
the two strategies that creates the problem).
Experienced and careful PGP users can discover this attack easily. It is also a very daring
attack: Eve identifies herself to Carol, and Bob knows her name immediately. For this reasons it
is not very likely that someone will do this attack for real. Nevertheless, if someone decided to try
it, he or she has a fair chance to get away with it. And once you have fooled Bob, he might sign
the key and give it to his friends. If there is one gap in the web of trust, it can spread and infect
the whole system (again, it is not likely to happen in real life, but it neither completely unlikely).
I think it is an attack with a very good effort/effect ratio: the attackers needs no computing power
at all, just a little bit of luck and carelessness.
72 CHAPTER 5. BUGS AND ATTACKS
Chapter 6
Sourcecode analysis
73
74 CHAPTER 6. SOURCECODE ANALYSIS
The method used instead of reading the source entirely was searching the code for interesting
pieces, large comments, specific keywords, and checking methods called or callers of methods. The
utility called WinGrep by Huw Millington was of great value by these searches.
&bn is the memory address of the variable bn. bnBegin will store a reference to the new memory in
this variable. mgr is the memory manager that knows how to get memory, and the TRUE indicates
that the memory could contain sensitive information. It is hard to check whether windows will not
store this information despite all PGP’s efforts (Windows NT is a large program, the source is not
available), but we did check that all FALSE indications are correct.
It is not hard to find all occurrences of the String ”bnBegin” by using a program like Windows
Grep. Most of these invocations are correct, but a few needed more attention: The last arguments
are sometimes omitted, like
bnBegin(&a);
bnBegin(&e);
in the file bndsaprime.c. It was hard to find out what this means because there is no bnBegin
function or macro with one argument. It turned out that the 1-argument bnBegin’s occur only in
old code that under normal settings is removed by the preprocessor. If the settings are changed,
the compiler complains about the functions. These findings are not dangerous on itself, but it
does make one suspicious: How often and thorough has this code been checked? Is this secure
programming?
6.4.1 RSA
The sourcecode of RSA is complicated by the fact that the code can use different libraries.pgpaltrsaglue.c1
uses the RSAREF cryptographic library from RSA Data Security. This file is not used under de-
fault settings. The same is true for pgpBRSAGlue.c. This uses the BSAFE library also from RSA
76 CHAPTER 6. SOURCECODE ANALYSIS
Data Security. The next RSA file, pgpMSRSAGlue.c uses the MicroSoft Crypto API. It is also not
used in default settings (sigh). The file pgpRSAGlue is the one truly used. It uses PGP’s own big
number code, which is the fastest.
I do not think anyone would want to use the alternative libraries. They were used because of
legal problems. The legal problems are solved. The alternative libraries are slower than PGP’s
own, and some libraries have lower limits on the key size.
A very interesting file is pgpRSAkey.c. It contains the code to generate RSA keys. PGP
generates random primes by generating a random number of the specified size, setting the lowest
bit (to make it odd), and setting the highest two bits. The highest bit must be set to make the
number the requested size. The use of setting the second highest bit is unclear: The comment
states that it is forcing the number to be in the high range makes it harder to factor, but I think
they mean that setting these bits makes sure that n is the requested size (it can happen that you
multiply two 512 bit numbers, and get a 1023 bit number, and that is generally not what you
want). A prime just above the starting random number is returned by the function bnPrimeGen
(not always the first prime, because it shuffles the 256 numbers above the starting point).
It is funny to see that studying the exact details reveils some bits of n. Because q, p > 3 ∗ 2512
it must be that n > 9 ∗ 21024 so n does not start with 1000.
For e a fixed value called RSA DEFAULT EXPONENT is taken. In the sourcecode it is 17,
but the PGP version 7 I use uses 65537. The code contains a bug: after generating q, they should
check that gcd(e, q − 1) == 1. They do not however. The code (line 1454) says
The test is redundant, because the bnPrimeGen function can have e as an extra argument, forcing
it to search for a prime for which e is invertible: It has been assured that the properties holds
at the moment the number was generated, so the test is superfluous. Therefore this bug is not
exploitable, but it is worrying that there is a bug in such a crucial section of PGP.
6.4.2 DSA
The file pgpDSAkey.c contains the code for DSA signatures. PGP does the checks needed to verify
a signature (r, s < q). Note that PGP cannot deal with negative big integers, so that positivity
checks can be omitted. This is quite unusual, most big integer libraries are general-purpose so they
admit negative numbers, but it seems to add to the safety of PGP.
The comment surrounding the DSA code also motivates the existence of a dummy pool: Because
the public parameters are generated at the same time as the secret parameter x, they fear that it
could in theory be possible to reconstruct the state of the random pool from the public parameters
(by breaking the hashfunction, which is hard). To prevent this, they want to generate the public
parameters from only a small part from the pool. In this way the public parameters represent only
a small part of the pool. The dummy generator is seeded with 64 bits from the random pool. This
is not unguessable (80 bit is secure according to section 2), but that is not what we need here: 64
bits assures that we have unique public parameters each time. Note that having the same public
parameters is not even a security risk: The algorithm also works with common public parameters.
It is just a more attractive target for attackers. (If PGP had common public parameters, rumors
about a backdoor could rise.)
6.5. RANDOM NUMBERS 77
CBC-hashed with the key buffer, and the result replaces one word of the pool (word means a block
of bytes with exactly the length of the current hash output). The Xoring ensures that no entropy
is lost: the randomness in the overwritten data is still present in the outcome of the Xor function.
The hash hides all patterns in the input. The truly random data gathered from various sources
is often not uniformly distributed: certain values are more likely than others. Applying a hash
function does not increase nor decrease the amount of entropy in the data1 , but it does hide all
skewedness in the distribution of the data and returns uniformly distributed bits.
The most important question that remains is where the entropy in the input of the TRNG
comes from. There are a few ways to collect entropy in a standard PC. PGP uses from keypress
timings, mouse movement timings, key presses (the actual values) and various elements called
OS data. What the OS data is depends on the operating system. Under Windows it consists of
statistical data about processes, which can be found in the registry. PGP normally collects these
data silently, but if a lot of entropy is needed, for instance when generating a new public/private
key pair, it shows a dialog where the user is asked to act randomly. A further source of random
data is a hardware random device. This is a bunch of electronics that can for instance be included
in a processor that measures noise. Intel thought about including it in modern processors around
1999, but they decided otherwise. The device is not present in any processor but available on some
motherboards (with certain Intel chipsets). The PGP code is ready to use it. In the sourcecode
distribution it is switched off (a flag is defined in pgpSDKbuildflags.h), but in the executables it is
turned on. These hardware random devices are very fast, and probably as safe as other randomness
sources. Check [9] for a report on this random device.
PGP carefully estimates the entropy in each sample. For the key presses, it does not assume
any entropy in a character that appeared more than once in the past four presses. It also tries to
estimate the entropy based on previous samples using a formula for the entropy E in a sample X
with an expected value A: 2E = abs(X − A). This formula does not work on any data stream, but
it seems safe for timings of random events.
It uses this formula with the expected value based on the previous sample, and using a first
order derivative on last two examples, and with A based on a second order derivative on previous
three examples, . . . , as far back as requested. For mouse events, it looks two samples back, but for
key presses there is a three step history. The minimum value of E found is used.
What makes the code hard to read is that the PGP source code wants to use fractional entropy
without using floating-point numbers (not all processors have these instructions. especially the
original 8088 chip that PGP 1.0 was designed for and current embedded chips). To store an
amount of entropy it uses two integers: the first contains the integer number of bits, floor(E) for
the mathematicians, the other stores (2E−floor(E) ) ∗ 232 . It is multiplied by 232 to make it integer.
The 2E can be estimated directly with the formula given above, and the most important test,
E > 1, can be done by noticing 2E > 2.
With the help of the dummy pool, a strong pseudorandom number generator can be created
(a pseudorandom number generator is a deterministic function that produces randomly looking
output). You need to random pool to create one, so that is properly initiated. If you use the
dummy pool, it is of course not properly inititiated. so this generator must then be seeded again
with output from the real pool, and is then used to generate the requested random data. This
procedure minimises the drain from the random pool and is safe. It is also a good example of not
so good programming: One author enforced the proper seeding of pseudorandom number generators
by demanding a random pool to create a generator, another author bypassed this security measure
by creating a dummy pool.
The files closely examined are:
• pgpRndWin32.h
• pgprandompool.h
• pgpRndWin32.c
• pgprandompool.c
/* Experimental XXX@@@ */
/* XXX: should I do this?*/
XXX Beware that calling this routine could be a security flaw,
80 CHAPTER 6. SOURCECODE ANALYSIS
I could not exploit the possible bugs mentioned above, and sometimes it is just a ’beauty error’:
For instance a LazyProgrammer error just indicates that they have no time to produce the right
kind of error message, but they still give an error message. It is hard to start just at a marked
point, and reason how to exploit it. If you start tracing how you got into the code, you quickly end
up with 15 source code files calling each other and things seeming to be OK.
6.7 Conclusions
It is hard to find all bugs in the source code of a large program, especially if you are not the author.
Simply reading the source code from start to end is not a good method. What we have done is
scanning the code, reading all comments, and studying the parts we think were interesting. This
results in two things:
• Finding implementation level details in certain algorithms. It shows how PGP exactly does
certain things, for instance in the PK section. The textbook descriptions leave a lot of details
to the programmer. It is important to know those details, because they might affect the
security.
• Finding certain bugs, omissions or pecularities in the code. We have found a handful of these,
but are convinced that there are more.
The general impression is that the PGP programmers knew what they were doing, because
important things are checked and doublechecked, and that there is a lot of effort put into PGP to
make it secure. This give some confidence in PGP.
Unfortunately it is also clear that PGP contains bugs, like any other program. This does not
mean one should not use it, but it means that one must remain careful when using PGP. PGP
security is not absolute.
Chapter 7
Conclusion
It was very interesting to dive deeply into a complete implementation of a crypto system, especially
into such a famous system with such colorful history of PGP. Maybe the most important conclusion
of our research is that building a cryptosystem involves more that combining a few algorithms. As
for the questions we posed, these are our answers:
On page 69 you will find my PGP attack plan. It allows one to forge signatures, and even read
messages in the right circumstances. So the answer is yes :-)
If PGP will be broken in the future, it will most likely not be one of the algorithms in PGP
that will be broken: Those algorithms are very strong and many researchers already have tried to
attack them, without practical results. The bugs in PGP are the most worrying, and the security
of the whole computer you use PGP on.
81
82 CHAPTER 7. CONCLUSION
[1] Carlisle Adams. The CAST-128 Encryption Algorithm, RFC 2144 edition, 1997.
[2] Derek Atkins, W Stallings, and Philip Zimmermann. PGP messages exchange format, RFC
1991 edition, 1996.
[3] J Callas, L Donnerhacke, H Finney, and R Thayer. openPGP Message Format, rfc 2440 edition,
1998.
[6] Simson Garfinkel. PGP:Pretty Good Privacy. O’Reilly and associates, 1995.
[7] Peter Gutmann. Secure deletion of data from magnetic and solid-state memory. Sixth USENIX
Security Symposium Proceedings, 1996.
[8] Johan Hastad. Solving simutaneous modular equations of low degree. SIAM Journal on
Computing, 17(2), 1988. http://www.nada.kth.se/ johanh/papers.html.
[9] Benjamin Jun and Paul Kocher. The intel random number generator, 1999.
[10] B Kaliski. RSA encryption PKCS 1 version 1.5, rfc 2313 edition, 1998.
[11] J Katz and B Schneier. A chosen ciphertext attack against several e-mail encryption protocols.
www.counterpane.com, 2000.
[12] Ueli Maurer. Modelling a public key infrastructure. proceedings of European Symposium on
Research in Computer Security 1996, (1146), 1996.
[13] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of Applied Cryp-
tography. CRC press, 1996. available as pdf.
[15] NIST. Digital Signature Standard(DSS), fips publication 186 edition, 1994.
[16] NIST. Secure Hash Standard (SHS), fips publication 180 edition, 1994.
[17] Ron Rivest. The MD5 Message-Digest Algorithm, RFC 1321 edition, 1992.
[18] Bruce Schneier. Applied Cryptography. John Wiley and Sons, 2 edition, 1996.
83
84 BIBLIOGRAPHY
85
86 INDEX
Massey, 28 Rijndael, 32
Matsui, 31 Rivest, 19, 34, 40
MD4, 18, 40 Rosa, 64
meet in the middle attack, 32 RSA, 34
Merkle-Hellman, 17 RSA Data Security, 17
Merritt, 17 RSA Security, 18
Metamorphic Systems, 17 RSADS, 17
MIT, 19 RSAREF, 19
NAI, 20 S-box, 29
Network Associates Incorporated, 22 S2K, 55
NIST, 38 Schiller, 19
normal encryption, 27 Schumacher, 21
NSA, 31 Secure Hash Algorithm, 41
Number Field Sieve, 36 self signature, 49, 50
Senderek, 62
old format, 46 SHA, 41, 75
PEM, 19 Shamir, 31, 34
perfect secrecy, 42 Sophie Germain prime, 37
PGP, 7 String to Key specifier, 55
PGP Europe, 21 Tavares, 31
PGP Incorporated, 21 TEMPEST, 23
PGPDisk, 23 TIS, 22
PGPPhone, 20 traffic analysis, 15, 53
PK, 34 TRNG, 75
PKCS, 40 Trusted Information Systems, 22
PKI, 7
PKP, 17 ViaCrypt, 19
Pohlig-Hellman, 38
PPGS, 18 web of trust, 13
Preneel, 42
Zimmermann, 17
Preston, 19
Pretty Good Privacy, 7
PRZ, 17
pseudo-random numbers, 75
public key algorithm, 34
Public Key algorithms, 27
Public Key Infrastructure, 7
Public Key Partners, 17
Quadratic Sieve, 36
quantum computer, 11
radix-64, 57
Random Number Generator, 75
Request For Comments, 45
RFC, 45
Rijmen, 32