0% found this document useful (0 votes)
41 views28 pages

Captcha: Rajni Sharma Cse-2 Sem (M. Tech)

This document discusses CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart). It defines CAPTCHAs, describes their background and invention, and covers types (e.g. text, graphic), applications, construction, attempts to break them, and issues. CAPTCHAs are challenges intended to be solvable by humans but not computers to distinguish the two. They have been used successfully to reduce spam but creating accessible and secure CAPTCHAs while advancing AI remains an ongoing challenge.

Uploaded by

Rajni Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views28 pages

Captcha: Rajni Sharma Cse-2 Sem (M. Tech)

This document discusses CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart). It defines CAPTCHAs, describes their background and invention, and covers types (e.g. text, graphic), applications, construction, attempts to break them, and issues. CAPTCHAs are challenges intended to be solvable by humans but not computers to distinguish the two. They have been used successfully to reduce spam but creating accessible and secure CAPTCHAs while advancing AI remains an ongoing challenge.

Uploaded by

Rajni Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

CAPTCHA

Rajni Sharma Cse-2nd Sem ( m. tech)

Agenda
Definition
Background Types Applications Constructing CAPTCHAs Breaking CAPTCHAs Issues with CAPTCHAs Conclusion

Intro
CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart
Invented at CMU by Luis von Ahn, Manuel

Blum, et. al
A program that is a challenge response test

to separate humans from computer programs

Generic CAPTCHAs distort letters and numbers Distorted characters are presented to user
User has to recognize the distorted letters If the guessed letters are correct, the user is

inferred to be a human and allowed access


Else, user is a bot and denied access

Humans can read the distorted and noisy text


Current OCRs cannot read them

Background
Why CAPTCHA was needed?
Sabotage of online polls

Spam emails
Abusing free online accounts Tampering with rankings on recommendation

systems (like EBay, Amazon)

Altavista first used a crude CAPTCHA in their sites Resulted in 95% spam reduction
Yahoo partnered CMU to counter these

threats in Messenger chat service.


Luis von Ahn and Manuel Blum of CMU

trademarked CAPTCHA in 2000

What is a Turing test?


Proposed by Alan Turing To test a machines level of intelligence Human judge asks questions to two participants,

one is a machine, he doesnt know which is which If judge cant tell which is the machine, the machine passes the test CAPTCHA employs a reverse Turing test, judge = CAPTCHA program, participant = user if user passes CAPTCHA, he is human if user fails, it is a machine

Types of CAPTCHAs
Text based:
Simple, normal language questions:
What is sum of three and thirty-five? If today is Saturday, what is day after tomorrow? Which of mango, table, water is a fruit?

Very effective, needs a large question bank Cognitively challenged users find it hard

Gimpy: Designed by Yahoo and CMU Picks up 10 random words from dictionary and distorts, fills with noise User has to recognize at least 3 words If user is correct, he is admitted

EZ-Gimpy: A modified version of Gimpy Yahoo used this version in Messenger Has only 1 random string of characters Not a dictionary word, so not prone to dictionary attack Not a good implementation, already broken by OCRs

MSNs Passport service CAPTCHAs:

Provided for Microsofts MSN services Use 8 characters Warping is used to distort Very strong implementation, hasnt been broken It is segmentation-resistant

Graphic based CAPTCHAs:


BONGO:

After M.M.Bongard, pattern recognition expert User has to solve a pattern recognition problem Has to tell the distinct characteristic between two sets of figures Then tell to which set a given figure belongs to

PIX: Uses a large database of labelled images It shows a set of images, user has to recognize the common feature among those E.g., Pick the common characteristic among the following four pictures-----Aeroplane

Audio CAPTCHAs:
Consist of downloadable audio clip User listens and enters the spoken word

Helps visually disabled users


Below is the Googles audio enabled CAPTCHA Not popular

Applications
Protect online polls
Prevent Web registration abuse, protect

passwords from brute-force attack


Prevent comment spam and spam emails E-Ticketing, prevent scalping

Verify digitized books: reCAPTCHA


Used in Google Books Project Two words are shown, the program knows first

word If user enters first word correctly, it assumes that the second unknown word will also be entered correctly Second word becomes known

Help advance AI knowledge


CAPTCHAs are called Hard-AI problems A win-win scenario:

If CAPTCHAs are broken by a bot, a Hard-AI problem is solved If its not yet broken, then current implementation is able to withstand attacks
Thus AI knowledge is advanced if CAPTCHAs are

broken

Constructing CAPTCHAs
Things to keep in mind:
Dont store CAPTCHA solution in Web pages

metadata
A CAPTCHA is no good if it doesn't distort Need a large database of different CAPTCHA

questions
Avoid repetition of questions

CAPTCHA Logic:
Generate the question Persist the correct answer Present the question to user Evaluate answer, if incorrect, start again--

Generate a different CAPTCHA


If correct, allow access to user

Embeddable CAPTCHAs:
Available freely, just embed code into Web pages

HTML, from e.g., www.recaptcha.net No maintenance

Custom CAPTCHAs:
Fits to the theme of the page Better protected from spammers

Can be written in any language Perl, .NET, ASP, JavaScript

Guidelines:
Accessibility Image security Script security Security after widespread adoption Custom implementation or a general CAPTCHA?

Breaking CAPTCHAs
Cracking CAPTCHAs through programs
Convert CAPTCHA into greyscale

Detect patterns in the image corresponding to characters Or, read session files of that user and know the CAPTCHA word
Solution: Only store a hash of the CAPTCHA word in session files

Greg Mori and Jitendra Malik have broken

text CAPTCHAs, e.g., Ez-Gimpy


To break this CAPTCHA

Segmentation: Locate possible letters in the image Construct graph of consistent letters
Find out plausible words from the graph, use scores to rank roll=11.94, profit=9.42 (better match)

Social engineering to break CAPTCHAs:


Spammer encounters a CAPTCHA That CAPTCHA is copied to another site Humans are baited, e.g., free MP3s To get those MP3s, users are told to solve the

copied CAPTCHA Solution is routed to the spammer


Solution: Fix a time-to-live period for a question

CAPTCHA cracking as a business:


Firms offer CAPTCHA cracking service in

exchange for money

Issues with CAPTCHAs


Usability issues:
W3C mandates Web to be accessible to all people Some CAPTCHAs are inaccessible to visually

impaired, cognitively challenged people

Compatibility issues:
JavaScript may need to be activated in browsers
Some may need Adobe Flash plugin installed

Summary
CAPTCHAs are an effective way to counter bots and reduce spam
They serve dual purpose help advance AI

knowledge Applications are varied from stopping bots to character recognition & pattern matching Some issues with current implementations represent challenges for future improvements

THANK YOU

You might also like