70% found this document useful (10 votes)
13K views28 pages

CAPTCHA Presentation

This document discusses CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart). It defines CAPTCHAs, describes their background, types, applications, how they are constructed and broken, and issues with CAPTCHAs. The key points are that CAPTCHAs aim to distinguish humans from bots by presenting challenges that are easy for humans but difficult for computers to solve, they have many applications including preventing spam and abuse, but also have usability and accessibility issues for some users.

Uploaded by

bsbharath1987
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
70% found this document useful (10 votes)
13K views28 pages

CAPTCHA Presentation

This document discusses CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart). It defines CAPTCHAs, describes their background, types, applications, how they are constructed and broken, and issues with CAPTCHAs. The key points are that CAPTCHAs aim to distinguish humans from bots by presenting challenges that are easy for humans but difficult for computers to solve, they have many applications including preventing spam and abuse, but also have usability and accessibility issues for some users.

Uploaded by

bsbharath1987
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

By

BHARATH B S
4VV05CS009
Agenda

 Definition
 Background
 Types
 Applications
 Constructing CAPTCHAs
 Breaking CAPTCHAs
 Issues with CAPTCHAs
 Conclusion
Intro

 CAPTCHA Completely Automated


Public Turing test to tell Computers
and Humans Apart

 Invented at CMU by Luis von Ahn,


Manuel Blum, et. al

 A program that is a challenge –


response test to separate humans
from computer programs
 Generic CAPTCHAs distort letters and
numbers

 Distorted characters are presented


to user

 User has to recognize the distorted


letters

 If the guessed letters are correct, the


user is inferred to be a human and
allowed access
 Humans can read the distorted and
noisy text

 Current OCRs cannot read them


Background

 Why CAPTCHA was needed?

 Sabotage of online polls


 Spam emails
 Abusing free online accounts
 Tampering with rankings on
recommendation systems (like EBay,
Amazon)
 Altavista first used a crude CAPTCHA
in their sites

 Resulted in 95% spam reduction

 Yahoo partnered CMU to counter


these threats in Messenger chat
service.

 Luis von Ahn and Manuel Blum of


CMU trademarked CAPTCHA in 2000
 What is a Turing test?
 Proposed by Alan Turing
 To test a machine’s level of intelligence
 Human judge asks questions to two
participants, one is a machine, he
doesn’t know which is which
 If judge can’t tell which is the machine,
the machine passes the test
 CAPTCHA employs a reverse Turing test,
judge = CAPTCHA program,
participant = user
if user passes CAPTCHA, he is human
if user fails, it is a machine
Types of CAPTCHAs

 Text based:

 Simple, normal language questions:


 What is sum of three and thirty-five?
If today is Saturday, what is day after
tomorrow?
 Which of mango, table, water is a fruit?
 Very effective, needs a large question
bank
 Cognitively challenged users find it hard
 Gimpy:
 Designed by Yahoo and CMU
 Picks up 10 random words from dictionary
and distorts, fills with noise
 User has to recognize at least 3 words
 If user is correct, he is admitted
 EZ-Gimpy:
 A modified version of Gimpy
 Yahoo used this version in Messenger
 Has only 1 random string of characters
 Not a dictionary word, so not prone to
dictionary attack
 Not a good implementation, already
broken by OCRs
 MSN’s Passport service CAPTCHAs:

 Provided for Microsoft’s MSN services


 Use 8 characters
 Warping is used to distort
 Very strong implementation, hasn’t been
broken
 It is segmentation-resistant
 Graphic based CAPTCHAs:

 BONGO:
 After M.M.Bongard, pattern recognition
expert
 User has to solve a pattern recognition
problem
 Has to tell the distinct characteristic
between two sets of figures
 Then tell to which set a given figure
belongs to
 PIX:
 Uses a large database of labelled images
 It shows a set of images, user has to
recognize the common feature among
those
 E.g., Pick the common characteristic
among the following four
pictures-----”Aeroplane”
 Audio CAPTCHAs:
 Consist of downloadable audio clip
 User listens and enters the spoken word
 Helps visually disabled users
 Below is the Google’s audio enabled
CAPTCHA
 Not popular
Applications

 Protect online polls

 Prevent Web registration abuse,


protect passwords from brute-force
attack

 Prevent comment spam and spam


emails

 E-Ticketing, prevent scalping


 Verify digitized books: reCAPTCHA
 Used in Google Books Project
 Two words are shown, the program
knows first word
 If user enters first word correctly, it
assumes that the second unknown word
will also be entered correctly
 Second word becomes “known”
 Help advance AI knowledge

 CAPTCHAs are called Hard-AI problems


 A win-win scenario:
 If CAPTCHAs are broken by a bot, a Hard-
AI problem is solved
 If its not yet broken, then current
implementation is able to withstand
attacks

 Thus AI knowledge is advanced if


CAPTCHAs are broken
Constructing CAPTCHAs

 Things to keep in mind:


 Don’t store CAPTCHA solution in Web
page’s metadata

 A CAPTCHA is no good if it doesn't


distort

 Need a large database of different


CAPTCHA questions

 Avoid repetition of questions


 CAPTCHA Logic:

 Generate the question

 Persist the correct answer

 Present the question to user

 Evaluate answer, if incorrect, start


again-- Generate a different CAPTCHA

 If correct, allow access to user


 Embeddable CAPTCHAs:
 Available freely, just embed code into
Web page’s HTML, from e.g.,
www.recaptcha.net
 No maintenance

 Custom CAPTCHAs:
 Fits to the theme of the page
 Better protected from spammers

Can be written in any language– Perl,


.NET, ASP, JavaScript
 Guidelines:
 Accessibility

 Image security

 Script security

 Security after widespread adoption

 Custom implementation or a general


CAPTCHA?
Breaking CAPTCHAs

 Cracking CAPTCHAs through


programs

 Convert CAPTCHA into greyscale


 Detect patterns in the image
corresponding to characters
 Or, read session files of that user and
know the CAPTCHA word
 Solution: Only store a hash of the
CAPTCHA word in session files
 Greg Mori and Jitendra Malik have
broken text CAPTCHAs, e.g., Ez-
Gimpy
 To break this CAPTCHA 

 Segmentation: Locate possible


letters in the image 

 Construct graph of consistent


letters 

 Find out plausible words from


the graph, use scores to rank
roll=11.94, profit=9.42 (better match)
 Social engineering to break
CAPTCHAs:
 Spammer encounters a CAPTCHA
 That CAPTCHA is copied to another site
 Humans are baited, e.g., free MP3s
 To get those MP3s, users are told to
solve the copied CAPTCHA
 Solution is routed to the spammer
 Solution: Fix a time-to-live period for a
question

 CAPTCHA cracking as a business:


 Firms offer CAPTCHA cracking service in
exchange for money
Issues with CAPTCHAs

 Usability issues:
 W3C mandates Web to be accessible to
all people
 Some CAPTCHAs are inaccessible to
visually impaired, cognitively challenged
people

 Compatibility issues:
 JavaScript may need to be activated in
browsers
 Some may need Adobe Flash plugin
Summary

 CAPTCHAs are an effective way to


counter bots and reduce spam
 They serve dual purpose– help
advance AI knowledge
 Applications are varied– from
stopping bots to character
recognition & pattern matching
 Some issues with current
implementations represent
challenges for future improvements

You might also like