Research On CAPTCHA Recognition Technology Based o
Research On CAPTCHA Recognition Technology Based o
DOI: 10.54254/2755-2721/81/20240967
Shengyuan Tang
College of Mathematics and Information & College of Software Engineering, South
China Agricultural University, Guangzhou, 510642, China
Abstract. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans
Apart), a security technique widely used on the Internet, can be utilized to distinguish human
users from automated programs. The rapid development of deep learning technology has led to
the emergence of graph-based recognition techniques that demonstrate excellent performance,
which has prompted the investigation of CAPTCHA recognition based on deep learning as a
research area of significant interest. This paper addresses the challenging problem of CAPTCHA
recognition based on deep learning techniques, reviews the development and classification of
CAPTCHA, examines traditional CAPTCHA recognition methods, and delves into the
application of deep learning in CAPTCHA recognition. Therefore, a CAPTCHA recognition
system is designed and its effectiveness is verified through experiments. This paper makes a
significant contribution to the field of CAPTCHA recognition by proposing a deep learning-
based approach, which not only enhances the accuracy and efficiency of CAPTCHA recognition,
but also provides new ideas and methods for the further development. In the future, further
research will be conducted in the field of CAPTCHA recognition to explore additional deep
learning models and techniques with the aim of boosting the security and user experience of
CAPTCHA.
1. Introduction
CAPTCHA, which stands for Completely Automated Public Turing test to tell Computers and Humans
Apart, was first conceived in 2000 to distinguish whether a user is a computer or a human [1] [2]. In the
contemporary era, CAPTCHA is a prevalent tool employed by numerous websites, largely due to its
straightforward maintenance and favorable user experience [3]. However, the rapid development of
image recognition technology is posing an increasing challenge to traditional CAPTCHA techniques,
making the effective recognition and cracking of CAPTCHAs a significant research topic [4].The paper
attempts to develop a deep learning-based CAPTCHA recognition system that aims to ensure
recognition accuracy by using a lightweight model, while reducing the recognition time to meet the time
constraints of entering CAPTCHAs on websites. The experimental results permit the formulation of
suggestions for future CAPTCHA recognition improvements. Therefore, this paper contributes to the
improvement of CAPTCHA recognition accuracy and efficiency through the implementation of deep
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
41
Proceedings of the 2nd International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/81/20240967
learning techniques. The application of advanced deep learning algorithms enables the efficient handling
of complex CAPTCHA recognition tasks. Moreover, a comprehensive examination of CAPTCHA
recognition technology can reveal its shortcomings, enhance the existing CAPTCHA generation
mechanism, reinforce the security of the existing system, and offer novel concepts and methodologies
for the future advancement of CAPTCHA technology. The improvement of CAPTCHA recognition and
generation techniques in the context of the increasing importance of network security has important
practical and application value.
42
Proceedings of the 2nd International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/81/20240967
3. Methodology
3.1.1. Environment Configuration. In terms of hardware information, specific hardware settings and
environmental configurations were employed in this experiment to ensure optimal performance for the
deep learning task. Table 1 provides a detailed overview of the hardware used, including a 4-core CPU,
a V100 AI accelerator with 32GB of GPU memory, 32GB of RAM, and 100GB of total storage. Table
2 outlines the software environment, specifying the versions of Python and PaddlePaddle used.
Table 1. Hardware Information for this Experiment
Name Model or Parameter
CPU 4 cores
AI Accelerator V100
Total GPU Memory 32GB
Total RAM 32GB
Total Storage 100GB
Table 2. Environment Configuration
Name Version
Python 3.74
PaddlePaddle 2.3.2
3.1.2. Parameter Configuration. This experiment uses deep learning-based CAPTCHA recognition
technology to configure the optimizer, model architecture, loss functions, evaluation metrics, and
parameters for training and evaluation. In the global configuration, the total number of training epochs
is set to 100, and the training log is printed every 10 batches during training. The pre-training path of
the model is set and the cosine learning rate scheduler is used with an initial learning rate of 0.001.
In regard to the model architecture, MobileNetV1Enhance is selected as the backbone network, and
a multi-head structure including CTCHead and SARHead is configured. The CTCHead uses SVTR
(Sequence to Sequence Visual Text Recognition) as the neck network, configured with 64 dimensions
and 2 layers, whereas the SARHead is configured with 512-dimensional encoding and a maximum text
length of 25. For the optimizer, the Adam optimizer is utilized for the optimization process, with the
beta1 and beta2 parameters set to 0.9 and 0.999, respectively. The loss function uses a multi-loss
structure, including CTCLoss and SARLoss, to better adapt to different types of CAPTCHA recognition
tasks. Training data is configured with various data augmentation and preprocessing operations, such as
image decoding, image enhancement, and image resizing. The batch size for the training dataset is set
to 256 per GPU, and the batch size for the evaluation dataset is set to 128 per GPU.
43
Proceedings of the 2nd International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/81/20240967
on the given split ratio. The dataset is then divided into training and testing sets using slicing operations.
To generate the character dictionary, the `gen_dict()` function creates a character list that includes
numbers and letters to accurately map characters and numbers in subsequent CAPTCHA recognition
tasks Finally, the `write_file()` function is used to write the processed data to the specified files,
including the training set list file `train_list.txt`, the testing set list file `test_list.txt`, and the character
dictionary file `dict.txt`.
44
Proceedings of the 2nd International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/81/20240967
4. Results
The experimental results show that the model performs well in the CAPTCHA recognition task. The
best accuracy of 0.997 and the normalized edit distance of 0.999 indicate near-perfect character
recognition. Despite the fact that float16 precision was not used in the training process, the model still
achieved 2491 frames per second (fps), demonstrating its high processing efficiency. The training
process showed rapid and stable convergence, with the best results observed at the 100th epoch.
A comparative analysis further highlights the model’s excellence in CAPTCHA recognition tasks,
with accuracy and normalized edit distance both approaching 1, ensuring high reliability and precision.
The high fps makes it suitable for real-time applications requiring quick responses, such as automated
form submissions, online security checks, and other verification systems. The model's robustness and
flexibility are evident even in the absence of float16 accuracy, indicating that it can maintain high
performance without compromising accuracy or speed.
Overall, the model performs exceptionally well in CAPTCHA recognition, achieving high accuracy
and fast processing speed. These attributes make it suitable for practical applications, enhancing user
experience and security in various online services. The successful integration of MobileNetV1 and the
SVTR algorithm showcases the potential of deep learning in tackling complex CAPTCHA recognition
tasks, offering a promising direction for future research and development in this field.
5. Discussion
45
Proceedings of the 2nd International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/81/20240967
Adversarial Networks (GANs), and verify their effect on enhancing the robustness of CAPTCHA
recognition systems. In summary, experimental research can explore the development of CAPTCHA
recognition technology in multiple aspects, providing important references for further research in the
field.
6. Conclusion
In this paper, a deep learning-based CAPTCHA recognition system is proposed using MobileNetV1 to
achieve high recognition speed and accuracy. The proposed system demonstrates that leveraging
lightweight models and advanced algorithms can significantly enhance CAPTCHA recognition
performance. This not only improves current CAPTCHA techniques but also contributes to developing
more secure CAPTCHA generation mechanisms. The study of CAPTCHA recognition technology can
elucidate its current limitations, optimize the existing CAPTCHA generation process, reinforce the
security of the existing system, and propose novel concepts and methodologies for the prospective
advancement of CAPTCHA technology. Future research may investigate the resistance of CAPTCHAs
to attack, multimodal recognition, and the generation of CAPTCHAs based on deep learning, with the
aim of further improving the capability of CAPTCHA technology.
References
[1] Lorenzi, D., Uzun, E., Vaidya, J. and Sural, S. (2018) Towards Designing Robust CAPTCHAs.
Journal of Computer Security, 26(6): 731 - 760.
[2] Von Ahn, L., Blum, M., Hopper, N.J. and Langford, J. (2003) CAPTCHA: Using Hard AI
Problems for Security. EUROCRYPT'03: Proceedings of the 22nd international conference on
Theory and Applications of Cryptographic Techniques, 294-311.
[3] Chen S.Y. (2024) Complex Text CAPTCHA Recognition with Small Data Set. Electronic Design
Engineering, 32(3): 54-58.
[4] Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012) ImageNet Classification with Deep
Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25:
1097-1105.
[5] Gao, H.C., Wang, W., Qi, J., et al. (2013) The Robustness of Hollow CAPTCHAs. Proceedings
of the 2013 ACM SIGSAC conference on Computer & communications security,1075–1086.
[6] Kumar, M., Jindal, M.K., et al. (2021) A Systematic Survey on CAPTCHA Recognition: Types,
Creation and Breaking Techniques. Archives of Computational Methods in Engineering,
1107-1136.
[7] Huang, K.Z., Hussain, A., Wang, Q.F. and Zhang, R. (2019) Deep Learning: Fundamentals,
Theory and Applications. Cognitive Computation Trends (COCT), 2: 111-138.
[8] Ketkar, N. (2017) Convolutional Neural Networks. Springer International Publishing, 63-78.
[9] Huang, G., Liu, Z., et al. (2017) Densely Connected Convolutional Networks. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 4700-4708.
[10] He, K.M., Zhang, X.Y., Ren, S.Q. and Sun, J. (2016) Deep Residual Learning for Image
Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
770-778.
46