0% found this document useful (0 votes)

10 views4 pages

DAWECA Notes

Uploaded by

Moad Elmardi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

DAWECA Notes

Uploaded by

Moad Elmardi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

ELMARDI – EL HAJJAR

An Experimental Investigation of Text-

based CAPTCHA Attacks and Their
Robustness

• The optical character recognition (OCR) technique can effectively attack the
primitive text-based CAPTCHAs that usually contain simple characters.

• Long short-term memory (LSTM) can recognize a text sequence end to end.

• Instead of collecting large real-world CAPTCHA samples, we train a model

with synthetic samples and then fine-tune the model with a small number
of real-world CAPTCHA samples. This is used in transfer learning-based
methods.

• A large-sized text-based CAPTCHA dataset is open for access via

https://www.kaggle.com/datasets/chenxuanxiao/xdcaptcha-dataset and
https://drive.google.com/drive/folders/1tgefqBsNUESpgERgSP21Dn1fadrUrGUf

• Text-based CAPTCHAs can be classified based on their resistance

mechanisms to three categories:

o Enriching character shapes: Character rotation, multi-fonts, hollow …

o Complicating character structures: Overlapping, CCT …

o Adding auxiliary interference: Noise arcs, complex

background…

• Attacks were sorted in two categories:

o Segmentation-based: separate characters in isolation and then

recognize them individually.

o Nonsegmentation-based: recognize text sequences in one step via

deep learning models.

Application Système 1
▪ End-to-end methods: train recognition models directly.

▪ Transfer learning-based attacks: leverage model training to fine-

tune pretrained models.

• Traditional attacks consist of three main steps: preprocessing, segmentation

and recognition. To accurately split the characters, preprocessing is usually
required to remove additional interferences.

• Generative adversarial network (GAN)-based models, a deep learning

technique, are used in the preprocessing stage to automatically remove the
complex interference features of text-based CAPTCHAs.
• Preprocessing:

o Image binarization: transform color images into black and white by

changing the color of each pixel using a thresholding algorithm.

o Dilation and erosion: adding and removing pixels around the objects in an
image, respectively.

o CFS: By illing the pixels in the same connected domain, it can convert
hollow characters into solid ones to facilitate better segmentation.

Application Système 2
Breaking CAPTCHAs on the Dark web

Problem 1: We want to do web scraping (data collection), but CAPTCHAs

prevent it.

• Scrapers are tools that enable navigation of websites and extraction of relevant
information for the user (more details in the link below).

• CAPTCHAs differentiate between humans and bots. They are easy for humans to solve
but difficult for bots.

• CAPTCHAs are used by website administrators to prevent automated activities

like SPAM, DDoS, and web scraping.

• Web scrapers are bots.

• https://proxyway.com/guides/how-to-bypass-captcha?ref=parsehub.com: A site
for viewing different types of CAPTCHAs.

Problem 2: How can a web scraper bypass a CAPTCHA that prevents it from scraping the web?
What is the impact of breaking CAPTCHAs? What role do OCR and ML play?

Why are CAPTCHAs used on the dark web?

For the same reasons as on the surface web, but also for:

• Preventing bot activity,

• Reducing load on hidden services,

• Mitigating cybersecurity threats,

• Protecting against Tor-specific issues,

• Regulating user access,

• Adhering to cultural and security norms.

Operational methods:

• Two methods for breaking CAPTCHAs:

o Using OCR (e.g., Tesseract)

o Using Machine Learning (e.g., TensorFlow)

Dataset details:

• Two CAPTCHA datasets, each containing 100,000 images.

Application Système 3
• CAPTCHAs are images of five characters.

• A third dataset is a combination of the two datasets.

• A test dataset contains 1,000 images (500 from each).

• The characters 'O', 'o', and '0' are absent from the CAPTCHAs.

Comparison methodology:

• Both Tesseract and TensorFlow are compared on the same test dataset.

• Evaluation metrics:

o Success rate: Indicates if the CAPTCHA is solved correctly. If even one

of the five characters is predicted incorrectly, the entire CAPTCHA is
considered incorrectly solved.

o Accuracy: Measured using the Levenshtein distance.

(https://en.wikipedia.org/wiki/Levenshtein_distance).

Evaluation results:

• TensorFlow Success Rate: DS2 > DS1 > DS1+2

• Tesseract Success Rate: DS1 > DS2

o Note: TensorFlow outperforms Tesseract for this metric.

• TensorFlow Accuracy: DS2 > DS1 > DS1+2

• Tesseract Accuracy: DS1 > DS2

Application Système 4

Unix Case Study
88% (8)
Unix Case Study
5 pages
What Humans Can Do, But Computers Can Not
No ratings yet
What Humans Can Do, But Computers Can Not
34 pages
Bypass Captcha Using Python and Tesseract OCR Engine
No ratings yet
Bypass Captcha Using Python and Tesseract OCR Engine
3 pages
Robust CAPTCHAs Towards Malicious OCR
No ratings yet
Robust CAPTCHAs Towards Malicious OCR
13 pages
On CAPTCHA
80% (5)
On CAPTCHA
33 pages
Text Based CAPTCHA Recognition
100% (1)
Text Based CAPTCHA Recognition
24 pages
Mini Project Documation (Capctha) .DP
No ratings yet
Mini Project Documation (Capctha) .DP
70 pages
Hidalgo 2011
No ratings yet
Hidalgo 2011
73 pages
Captcha Research Paper
No ratings yet
Captcha Research Paper
8 pages
563 10 3captcha 110422121116 Phpapp01
No ratings yet
563 10 3captcha 110422121116 Phpapp01
21 pages
An Experimental Investigation of Text-Based CAPTCHA Attacks and Their Robustness
No ratings yet
An Experimental Investigation of Text-Based CAPTCHA Attacks and Their Robustness
37 pages
Improving Strength of Captcha: BY-Alok Nandan Jha (7503895) Nishith V Oze (7503906)
No ratings yet
Improving Strength of Captcha: BY-Alok Nandan Jha (7503895) Nishith V Oze (7503906)
19 pages
Application Systme
No ratings yet
Application Systme
3 pages
PWP MP
No ratings yet
PWP MP
15 pages
CAPTCHA Solving With Neural Networks
No ratings yet
CAPTCHA Solving With Neural Networks
1 page
Whatsnew Creo7
No ratings yet
Whatsnew Creo7
192 pages
Capt Cha
No ratings yet
Capt Cha
34 pages
CAPTCHA Presentation
No ratings yet
CAPTCHA Presentation
29 pages
Recognising CAPTCHA Using Neural Networks: J Component Project
No ratings yet
Recognising CAPTCHA Using Neural Networks: J Component Project
30 pages
Seminar Guide Ms Revathy R.: Submitted by Sindhu Kumar Sharma 12100069
No ratings yet
Seminar Guide Ms Revathy R.: Submitted by Sindhu Kumar Sharma 12100069
25 pages
Captcha: by M.Pratyusha 07P71A0557
No ratings yet
Captcha: by M.Pratyusha 07P71A0557
29 pages
DL Synopsis
No ratings yet
DL Synopsis
7 pages
The Ultimate Challenge-Response Test
No ratings yet
The Ultimate Challenge-Response Test
31 pages
Robust Captcha
No ratings yet
Robust Captcha
13 pages
Reverse Engineering CAPTCHAs PDF
No ratings yet
Reverse Engineering CAPTCHAs PDF
10 pages
Web Captcha: Human or Script? An AI Approach To Cryptography
No ratings yet
Web Captcha: Human or Script? An AI Approach To Cryptography
31 pages
3D Captcha: A Next Generation of The Captcha
No ratings yet
3D Captcha: A Next Generation of The Captcha
32 pages
Materi CAPTCHA
No ratings yet
Materi CAPTCHA
21 pages
Synopsis New 1
No ratings yet
Synopsis New 1
16 pages
Presented By:: Avinash Maurya It Vi Sem 0829213008
No ratings yet
Presented By:: Avinash Maurya It Vi Sem 0829213008
26 pages
OPC Automation
No ratings yet
OPC Automation
18 pages
For - Mini - Project (2) - Read-Only Captcha
No ratings yet
For - Mini - Project (2) - Read-Only Captcha
12 pages
Research On Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha
No ratings yet
Research On Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha
16 pages
BY Asif Rahman S & Akhil Kumar
No ratings yet
BY Asif Rahman S & Akhil Kumar
28 pages
Captcha For Security in WWW
No ratings yet
Captcha For Security in WWW
3 pages
Ijettcs 2013 03 14 020
No ratings yet
Ijettcs 2013 03 14 020
6 pages
Are You Human?: (Sorry, I Had To Ask)
No ratings yet
Are You Human?: (Sorry, I Had To Ask)
72 pages
Strong CAPTCHA Guidelines v1.2: Jonathan Wilkins
No ratings yet
Strong CAPTCHA Guidelines v1.2: Jonathan Wilkins
18 pages
Text-Based CAPTCHA Strengths and Weaknesses
No ratings yet
Text-Based CAPTCHA Strengths and Weaknesses
13 pages
D.Y. Patil Technical Campus, Talsande Faculty of Engineering & Faculty of Management (Polytechnic)
No ratings yet
D.Y. Patil Technical Campus, Talsande Faculty of Engineering & Faculty of Management (Polytechnic)
8 pages
Captcha: Rajni Sharma Cse-2 Sem (M. Tech)
No ratings yet
Captcha: Rajni Sharma Cse-2 Sem (M. Tech)
28 pages
CAPTCHA: Telling Humans and Computers Apart Automatically
No ratings yet
CAPTCHA: Telling Humans and Computers Apart Automatically
40 pages
PC Build Checklist
No ratings yet
PC Build Checklist
5 pages
Web Captchas: Guide: K.Prashanth Kumar
No ratings yet
Web Captchas: Guide: K.Prashanth Kumar
28 pages
By Apoorva Arora MCA Roll No.9
No ratings yet
By Apoorva Arora MCA Roll No.9
28 pages
Captcha: by B.Deepaksai 08591A0519
No ratings yet
Captcha: by B.Deepaksai 08591A0519
28 pages
Research On CAPTCHA Recognition Technology Based o
No ratings yet
Research On CAPTCHA Recognition Technology Based o
6 pages
NCP-US en
No ratings yet
NCP-US en
43 pages
Captcha
No ratings yet
Captcha
33 pages
CAPTCHA Security A Case Study
No ratings yet
CAPTCHA Security A Case Study
7 pages
Captcha: A Seminar On
No ratings yet
Captcha: A Seminar On
33 pages
Viper809 Motheboard Pinout Diagram
100% (1)
Viper809 Motheboard Pinout Diagram
2 pages
CAPTCHA Breaking With Deep Learning
No ratings yet
CAPTCHA Breaking With Deep Learning
6 pages
M. Sc. Mathematics With Computer Science
No ratings yet
M. Sc. Mathematics With Computer Science
95 pages
Applications: Humans Apart". Carnegie Mellon University Attempted To Trademark The Term
No ratings yet
Applications: Humans Apart". Carnegie Mellon University Attempted To Trademark The Term
5 pages
Submitted By-Neha Arya Csvithsem Roll No: 0829210020
No ratings yet
Submitted By-Neha Arya Csvithsem Roll No: 0829210020
19 pages
IV - Common Errors in Datastage
No ratings yet
IV - Common Errors in Datastage
3 pages
Anki
No ratings yet
Anki
22 pages
Captcha Deepanjan 16 9
No ratings yet
Captcha Deepanjan 16 9
26 pages
T00160030120134075 (L) Pert 1 - Algorithm & Programming and Introduction To C Programming
No ratings yet
T00160030120134075 (L) Pert 1 - Algorithm & Programming and Introduction To C Programming
46 pages
Thinkcspy Ukzn Vol1 2016
No ratings yet
Thinkcspy Ukzn Vol1 2016
155 pages
Telling Humans and Computers Apart Automatically
No ratings yet
Telling Humans and Computers Apart Automatically
4 pages
EE2024
0% (1)
EE2024
11 pages
Captcha
No ratings yet
Captcha
4 pages
Ai-102 4
No ratings yet
Ai-102 4
15 pages
Cobit Gap Analysis
No ratings yet
Cobit Gap Analysis
17 pages
Delta4000 Ds en
0% (1)
Delta4000 Ds en
4 pages
MTRX Studio Operation Guide 5.4.1
No ratings yet
MTRX Studio Operation Guide 5.4.1
64 pages
AWOS Sample Manual
No ratings yet
AWOS Sample Manual
96 pages
Hwontlog
No ratings yet
Hwontlog
26 pages
Durga Black Book
No ratings yet
Durga Black Book
36 pages
Q-1 What Is Parsing? Explain XML Parsing and JSON Parsing With Example. OR Explain JSON Parsing With Example
No ratings yet
Q-1 What Is Parsing? Explain XML Parsing and JSON Parsing With Example. OR Explain JSON Parsing With Example
36 pages
CorporateProfile English
No ratings yet
CorporateProfile English
10 pages
Case Study Analysis: Effects of Instruction On The Misconceptions About Programming in Basic
No ratings yet
Case Study Analysis: Effects of Instruction On The Misconceptions About Programming in Basic
29 pages
CSBS R23 II Year Course Structure and Syllabus
No ratings yet
CSBS R23 II Year Course Structure and Syllabus
52 pages
Scandal of Fiddled Global Warming Data - WWW - Telegraph.co - Uk 2014-06-21
No ratings yet
Scandal of Fiddled Global Warming Data - WWW - Telegraph.co - Uk 2014-06-21
7 pages
Notula Rapat Bulan Desember 2021
No ratings yet
Notula Rapat Bulan Desember 2021
40 pages
SAS Viya The Python Perspective 1st Edition Kevin D. Smith PDF Download
No ratings yet
SAS Viya The Python Perspective 1st Edition Kevin D. Smith PDF Download
47 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
9 pages
ShashikiranMadhukar (21 0)
No ratings yet
ShashikiranMadhukar (21 0)
5 pages
Navigate The Panorama Web Interface
No ratings yet
Navigate The Panorama Web Interface
3 pages
Lovish's Resume Btech Chem
No ratings yet
Lovish's Resume Btech Chem
3 pages

DAWECA Notes

Uploaded by

DAWECA Notes

Uploaded by

ELMARDI – EL HAJJAR

An Experimental Investigation of Text-

• Instead of collecting large real-world CAPTCHA samples, we train a model

• A large-sized text-based CAPTCHA dataset is open for access via

• Text-based CAPTCHAs can be classified based on their resistance

o Enriching character shapes: Character rotation, multi-fonts, hollow …

o Complicating character structures: Overlapping, CCT …

o Adding auxiliary interference: Noise arcs, complex

• Attacks were sorted in two categories:

o Segmentation-based: separate characters in isolation and then

o Nonsegmentation-based: recognize text sequences in one step via

▪ Transfer learning-based attacks: leverage model training to fine-

• Traditional attacks consist of three main steps: preprocessing, segmentation

• Generative adversarial network (GAN)-based models, a deep learning

o Image binarization: transform color images into black and white by

Problem 1: We want to do web scraping (data collection), but CAPTCHAs

• CAPTCHAs are used by website administrators to prevent automated activities

• Web scrapers are bots.

Why are CAPTCHAs used on the dark web?

• Preventing bot activity,

• Reducing load on hidden services,

• Mitigating cybersecurity threats,

• Protecting against Tor-specific issues,

• Regulating user access,

• Adhering to cultural and security norms.

• Two methods for breaking CAPTCHAs:

o Using OCR (e.g., Tesseract)

o Using Machine Learning (e.g., TensorFlow)

• Two CAPTCHA datasets, each containing 100,000 images.

• A third dataset is a combination of the two datasets.

• A test dataset contains 1,000 images (500 from each).

o Success rate: Indicates if the CAPTCHA is solved correctly. If even one

o Accuracy: Measured using the Levenshtein distance.

• TensorFlow Success Rate: DS2 > DS1 > DS1+2

• Tesseract Success Rate: DS1 > DS2

o Note: TensorFlow outperforms Tesseract for this metric.

• TensorFlow Accuracy: DS2 > DS1 > DS1+2

• Tesseract Accuracy: DS1 > DS2

You might also like