Captcha: Using Hard AI Problems For Security
Captcha: Using Hard AI Problems For Security
Luis von Ahn1 , Manuel Blum1 , Nicholas J. Hopper1 , and John Langford2
1
Computer Science Dept., Carnegie Mellon University, Pittsburgh PA 15213, USA
2
IBM T.J. Watson Research Center, Yorktown Heights NY 10598, USA
1 Introduction
A captcha is a program that can generate and grade tests that: (A) most
humans can pass, but (B) current computer programs can’t pass. Such a program
can be used to differentiate humans from computers and has many applications
for practical security, including (but not limited to):
– Search Engine Bots. Some web sites don’t want to be indexed by search
engines. There is an html tag to prevent search engine bots from reading
web pages, but the tag doesn’t guarantee that bots won’t read the pages; it
only serves to say “no bots, please”. Search engine bots, since they usually
belong to large companies, respect web pages that don’t want to allow them
in. However, in order to truly guarantee that bots won’t enter a web site,
captchas are needed.
– Worms and Spam. captchas also offer a plausible solution against email
worms and spam: only accept an email if you know there is a human be-
hind the other computer. A few companies, such as www.spamarrest.com are
already marketing this idea.
– Preventing Dictionary Attacks. Pinkas and Sander [11] have suggested
using captchas to prevent dictionary attacks in password systems. The idea
is simple: prevent a computer from being able to iterate through the entire
space of passwords by requiring a human to type the passwords.
The goals of this paper are to lay a solid theoretical foundation for captchas,
to introduce the concept to the cryptography community, and to present several
novel constructions.
Note that from a mechanistic point of view, there is no way to prove that a
program cannot pass a test which a human can pass, since there is a program —
the human brain — which passes the test. All we can do is to present evidence
that it’s hard to write a program that can pass the test. In this paper, we take
an approach familiar to cryptographers: investigate state-of-the-art algorithmic
developments having to do with some problem, assume that the adversary does
not have algorithms for that problem that are are much better than the state-
of-the-art algorithms, and then prove a reduction between passing a test and
exceeding the performance of state-of-the-art algorithms. In the case of ordi-
nary cryptography, it is assumed (for example) that the adversary cannot factor
1024-bit integers in any reasonable amount of time. In our case, we assume that
the adversary cannot solve an Artificial Intelligence problem with higher accu-
racy than what’s currently known to the AI community. This approach, if it
achieves widespread adoption, has the beneficial side effect of inducing security
researchers, as well as otherwise malicious programmers, to advance the field of
AI (much like computational number theory has been advanced since the advent
of modern cryptography).
Related Work
The first mention of ideas related to “Automated Turing Tests” seems to appear
in an unpublished manuscript by Moni Naor [10]. This excellent manuscript
contains some of the crucial notions and intuitions, but gives no proposal for an
Automated Turing Test, nor a formal definition. The first practical example of
an Automated Turing Test was the system developed by Altavista [8] to prevent
“bots” from automatically registering web pages. Their system was based on
the difficulty of reading slightly distorted characters and worked well in practice,
but was only meant to defeat off-the-shelf Optical Character Recognition (OCR)
technology. (Coates et al [5], inspired by our work, and Xu et al [14] developed
similar systems and provided more concrete analyses.) In 2000 [1], we introduced
the notion of a captcha as well as several practical proposals for Automated
Turing Tests.
This paper is the first to conduct a rigorous investigation of Automated
Turing Tests and to address the issue of proving that it is difficult to write a
computer program that can pass the tests. This, in turn, leads to a discussion
of using AI problems for security purposes, which has never appeared in the
literature. We also introduce the first Automated Turing Tests not based on the
difficulty of Optical Character Recognition. A related general interest paper [2]
has been accepted by Communications of the ACM. That paper reports on our
work, without formalizing the notions or providing security guarantees.
We assume that A can have precise knowledge of how V works; the only piece
of information that A can’t know is r 0 , the internal randomness of V .
CAPTCHA
Intuitively, a captcha is a test V over which most humans have success close
to 1, and for which it is hard to write a computer program that has high success
over V . We will say that it is hard to write a computer program that has high
success over V if any program that has high success over V can be used to solve
a hard AI problem.
Remarks
1. The definition of an AI problem as a triple (S, D, f ) should not be inspected
with a philosophical eye. We are not trying to capture all the problems that
fall under the umbrella of Artificial Intelligence. We want the definition to
be easy to understand, we want some AI problems to be captured by it, and
we want the AI community to agree that these are indeed hard AI problems.
More complex definitions can be substituted for Definition 3 and the rest of
the paper remains unaffected.
2. A crucial characteristic of an AI problem is that a certain fraction of the
human population be able to solve it. Notice that we don’t impose a limit
on how long it would take humans to solve the problem. All that we require
is that some humans be able to solve it (even if we have to assume they will
live hundreds of years to do so). The case is not the same for captchas.
Although our definition says nothing about how long it should take a human
to solve a captcha, it is preferable for humans to be able to solve captchas
in a very short time. captchas which take a long time for humans to solve
are probably useless for all practical purposes.
AI Problems as Security Primitives
Gap Amplification
We stress that any positive gap between the success of humans and current
computer programs against a captcha can be amplified to a gap arbitrarily
close to 1 by serial repetition. The case for parallel repetition is more complicated
and is addressed by Bellare, Impagliazzo and Naor in [3].
Let V be an (α, β, η)-captcha, and let Vkm be the test that results by repeat-
ing V m times in series (with fresh new randomness each time) and accepting
only if the prover passes V more than k times. Then for any > 0 there exist
m and k with 0 ≤ k ≤ m such that Vkm is an (α, 1 − , )-captcha. In gen-
eral, we will have m = O(1/(β − η)2 ln(1/)) and sometimes much smaller. Since
captchas involve human use, it is desirable to find the smallest m possible.
This can be done by solving the following optimization problem:
m k
( )
X m i X m
min ∃k : β (1 − β)m−i ≥ 1 − & η i (1 − η)m−i ≤
m i i=0
i
i=k+1
Remarks
4.1 MATCHA
A matcha instance is described by a triple M = (I, T , τ ), where I is a distri-
bution on images and T is a distribution on image transformations that can be
easily computed using current computer programs. matcha is a captcha with
the following property: any program that has high success over M = (I, T ) can
be used to solve P1I,T .
The matcha verifier starts by choosing a transformation t ← T . It then flips
a fair unbiased coin. If the result is heads, it picks k ← I and sets (i, j) = (k, k).
If the result is tails, it sets j ← I and i ← U ([I]−{j}) where U (S) is the uniform
distribution on the set S. The matcha verifier sends the prover (i, t(j)) and sets
a timer to expire in time τ ; the prover responds with res ∈ {0, 1}. Informally,
res = 1 means that i = j, while res = 0 means that i 6= j. If the verifier’s timer
expires before the prover responds, the verifier rejects. Otherwise, the verifier
makes a decision based on the prover’s response res and whether i is equal to j:
– If i=j and res = 1, then matcha accepts.
– If i=j and res = 0, then matcha rejects.
– If i 6= j and res = 1, then matcha rejects.
– If i 6= j and res = 0, then matcha plays another round.
In the last case, matcha starts over (with a fresh new set of random coins): it
flips another fair unbiased coin, picks another pair of images (i, j) depending on
the outcome of the coin, etc.
Remarks
Lemma 1. Any program that has success greater than η over M = (I, T , τ ) can
be used to (δ, τ |[I]|)-solve P1I,T , where
η
δ≥ .
1 + 2|[I]|(1 − η)
Proof. Let B be a program that runs in time at most τ and has success σB ≥ η
over M . Using B we construct a program AB that is a (δ, τ |[I]|)-solution to
P1I,T .
The input to AB will be an image, and the output will be another image.
On input j, AB will loop over the entire database of images of M (i.e., the set
[I]), each time feeding B the pair of images (i, j), where i ∈ [I]. Afterwards,
AB collects all the images i ∈ [I] on which B returned 1 (i.e., all the images in
[I] that B thinks j is a transformation of). Call the set of these images S. If S
is empty, then AB returns an element chosen uniformly from [I]. Otherwise, it
picks an element ` of S uniformly at random.
We show that AB is a (δ, τ |[I]|)-solution to P1. Let p0 = PrT ,I,r [Br (i, t(i)) =
0] and let p1 = PrT ,j←I,i,r [Br (i, t(j)) = 1]. Note that
p0 p1 + (1 − p1 )(1 − σB ) p0 + p 1
Pr [hMr , Br0 i = reject] = 1 − σB = + = ,
r,r 0 2 2 1 + p1
(2) follows by the definition of the procedure AB , (3) follows by Jensen’s in-
equality and the fact that f (x) = 1/x is concave, (4) follows because
X
ET ,I [|S|] ≤ 1 + Pr(B(i, t(j)) = 1)
I,r
i6=j
and (5) follows by the inequalities for p0 , p1 and σB given above. This completes
the proof.
4.2 PIX
An instance P2I,T ,λ can sometimes be used almost directly as a captcha. For
instance, if I is a distribution over images containing a single word and λ maps
an image to the word contained in it, then P2I,T ,λ can be used directly as a
captcha. Similarly, if all the images in [I] are pictures of simple concrete objects
and λ maps an image to the object that is contained in the image, then P2I,T ,λ
can be used as a captcha.
Formally, a pix instance is a tuple X = (I, T , L, λ, τ ). The pix verifier works
as follows. First, V draws i ← I, and t ← T . V then sends to P the message
(t(i), L), and sets a timer for τ . P responds with a label l ∈ L. V accepts if
l = λ(i) and its timer has not expired, and rejects otherwise.
Theorem 2. If P2I,T ,λ is (δ, τ )-hard and X = (I, T , L, λ, τ ) is (α, β)-human
executable, then X is a (α, β, δ)-captcha.
Various instantiations of pix are in use at major internet portals, like Yahoo!
and Hotmail. Other less conventional ones, like Animal-PIX, can be found at
www.captcha.net. Animal-PIX presents the prover with a distorted picture of a
common animal (like the one shown below) and asks it to choose between twenty
different possibilities (monkey, horse, cow, et cetera).
Fig. 2. Animal-Pix.
Steganography Definitions
Fix a distribution over images I, and a set of keys K. A steganographic protocol
or stegosystem for I is a pair of efficient probabilistic algorithms (SE, SD) where
SE : K × {0, 1} → [I]` and SD : K × [I]` → {0, 1}, which have the additional
property that PrK,r,r0 [SDr (K, SEr0 (K, σ)) 6= σ] is negligible (in ` and |K|) for
any σ ∈ {0, 1}. We will describe a protocol for transmitting a single bit σ ∈
{0, 1}, but it is straightforward to extend our protocol and proofs by serial
composition to any message in {0, 1}∗ with at most linear decrease in security.
Definition 6. A stegosystem is steganographically secret for I if the distribu-
tions {SEr (K, σ) : K ← K, r ← {0, 1}∗} and I ` are computationally indistin-
guishable for any σ ∈ {0, 1}.
Steganographic secrecy ensures that an eavesdropper cannot distinguish traf-
fic produced by SE from I. Alice, however, is worried about a somewhat mali-
cious adversary who transforms the images she transmits to Bob. This adversary
is restricted by the fact that he must transform the images transmitted between
many pairs of correspondents, and may not transform them in ways so that they
are unrecognizable to humans, since he may not disrupt the communications
of legitimate correspondents. Thus the adversary’s actions, on seeing the image
i, are restricted to selecting some transformation t according to a distribution
T , and replacing i by t(i). Denote by t1...` ← T ` the action of independently
selecting ` transformations according to T , and denote by t1...` (i1...` ) the action
of element-wise applying ` transformations to ` images.
Definition 7. A stegosystem (SE, SD) is steganographically robust against T
if it is steganographically secret and
Pr [SDr (K, t1...` (SEr (K, σ))) 6= σ]
t1...` ←T ` ,r,r 0 ,K
Remarks
1. Better solutions to P2I,T ,λ imply more efficient stegosystems: if δ is larger,
then ` can be smaller and less images need to be transmitted to send a bit
secretively and robustly.
2. Since we assume that P2I,T ,λ (or, as it might be the case, P1I,T ) is easy for
humans, our protocol could be implemented as a cooperative effort between
the human recipient and the decoding procedure (without the need for a
solution to P1I,T or P2I,T ,λ ). However, decoding each bit of the secret
message will require classifying many images, so that a human would likely
fail to complete the decoding well before any sizeable hidden message could
be extracted (this is especially true in case we are dealing with P1I,T and
a large set [I]: a human would have to search the entire set [I] as many as
` times for each transmitted bit). Thus to be practical, a (δ, τ )-solution (for
small τ ) to P1I,T or P2I,T ,λ will be required.
Conclusion
We believe that the fields of cryptography and artificial intelligence have much to
contribute to one another. captchas represent a small example of this possible
symbiosis. Reductions, as they are used in cryptography, can be extremely useful
for the progress of algorithmic development. We encourage security researchers
to create captchas based on different AI problems.
Acknowledgments
We are greatful to Udi Manber for his suggestions. We also thank Lenore Blum,
Roni Rosenfeld, Brighten Godfrey, Moni Naor, Henry Baird and the anonymous
Eurocrypt reviewers for helpful discussions and comments. This work was par-
tially supported by the National Science Foundation (NSF) grants CCR-0122581
and CCR-0085982 (The Aladdin Center). Nick Hopper is also partially supported
by an NSF graduate research fellowship.
References
1. Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford. The
CAPTCHA Web Page: http://www.captcha.net. 2000.
2. Luis von Ahn, Manuel Blum and John Langford. Telling Humans and Computers
Apart (Automatically) or How Lazy Cryptographers do AI. To appear in Commu-
nications of the ACM.
3. Mihir Bellare, Russell Impagliazzo and Moni Naor. Does Parallel Repetition Lower
the Error in Computationally Sound Protocols? In 38th IEEE Symposium on Foun-
dations of Computer Science (FOCS’ 97), pages 374-383. IEEE Computer Society,
1997.
4. Mikhail M. Bongard. Pattern Recognition. Spartan Books, Rochelle Park NJ, 1970.
5. A. L. Coates, H. S. Baird, and R. J. Fateman. Pessimal Print: A Reverse Turing
Test. In Proceedings of the International Conference on Document Analysis and
Recognition (ICDAR’ 01), pages 1154-1159. Seattle WA, 2001.
6. Scott Craver. On Public-key Steganography in the Presence of an Active Warden.
In Proceedings of the Second International Information Hiding Workshop, pages
355-368. Springer, 1998.
7. Nicholas J. Hopper, John Langford and Luis von Ahn. Provably Secure Steganog-
raphy. In Advances in Cryptology, CRYPTO’ 02, volume 2442 of Lecture Notes in
Computer Science, pages 77-92. Santa Barbara, CA, 2002.
8. M. D. Lillibridge, M. Abadi, K. Bharat, and A. Broder. Method for selectively
restricting access to computer systems. US Patent 6,195,698. Applied April 1998
and Approved February 2001.
9. Greg Mori and Jitendra Malik. Breaking a Visual CAPTCHA.
Unpublished Manuscript, 2002. Available electronically:
http://www.cs.berkeley.edu/~mori/gimpy/gimpy.pdf.
10. Moni Naor. Verification of a human in the loop or Identification via
the Turing Test. Unpublished Manuscript, 1997. Available electronically:
http://www.wisdom.weizmann.ac.il/~naor/PAPERS/human.ps.
11. Benny Pinkas and Tomas Sander. Securing Passwords Against Dictionary Attacks.
In Proceedings of the ACM Computer and Security Conference (CCS’ 02), pages
161-170. ACM Press, November 2002.
12. S. Rice, G. Nagy, and T. Nartker. Optical Character Recognition: An Illustrated
Guide to the Frontier. Kluwer Academic Publishers, Boston, 1999.
13. Adi Shamir and Eran Tromer. Factoring Large Numbers with the
TWIRL Device. Unpublished Manuscript, 2003. Available electronically:
http://www.cryptome.org/twirl.ps.gz.
14. J. Xu, R. Lipton and I. Essa. Hello, are you human. Technical Report GIT-CC-
00-28, Georgia Institute of Technology, November 2000.
A Proof of Proposition 1
Proof. Consider for any 1 ≤ j ≤ ` and x ∈ [I] the probability ρjx that ij = x,
i.e. ρjx = Pr[ij = x]. The image x is returned in the jth step only under one of
the following conditions:
1. D0 : d0 = x and FK (j, λ(d0 )) = σ ; or
2. D1 : d1 = x and FK (j, λ(d0 )) = 1 − σ
Note that these events are mutually exclusive, so that ρjx = Pr[D0 ] + Pr[D1 ].
Suppose that we replace FK by a random function f : {1, . . . , `} × L → {0, 1}.
Then we have that Prf,d0 [D0 ] = 21 PrI [x] by independence of f and d0 , and
Prf,d1 [D1 ] = 12 PrI [x], by the same reasoning. Thus ρjx = PrI [x] when FK
is replaced by a random function. Further, for a random function the ρjx are
all independent. Thus for any σ ∈ {0, 1} we see that SE(K, σ) and I ` are
computationally indistinguishable, by the pseudorandomness of F .