DAWECA Notes
DAWECA Notes
• The optical character recognition (OCR) technique can effectively attack the
primitive text-based CAPTCHAs that usually contain simple characters.
• Long short-term memory (LSTM) can recognize a text sequence end to end.
background…
Application Système 1
▪ End-to-end methods: train recognition models directly.
o Dilation and erosion: adding and removing pixels around the objects in an
image, respectively.
o CFS: By illing the pixels in the same connected domain, it can convert
hollow characters into solid ones to facilitate better segmentation.
Application Système 2
Breaking CAPTCHAs on the Dark web
• Scrapers are tools that enable navigation of websites and extraction of relevant
information for the user (more details in the link below).
• CAPTCHAs differentiate between humans and bots. They are easy for humans to solve
but difficult for bots.
• https://proxyway.com/guides/how-to-bypass-captcha?ref=parsehub.com: A site
for viewing different types of CAPTCHAs.
Problem 2: How can a web scraper bypass a CAPTCHA that prevents it from scraping the web?
What is the impact of breaking CAPTCHAs? What role do OCR and ML play?
For the same reasons as on the surface web, but also for:
Operational methods:
Dataset details:
• The characters 'O', 'o', and '0' are absent from the CAPTCHAs.
Comparison methodology:
• Both Tesseract and TensorFlow are compared on the same test dataset.
• Evaluation metrics:
Evaluation results:
Application Système 4