Detection of Text from Lecture Video Images
Detection of Text from Lecture Video Images
Abstract — Text data in lecture video images play a conventional OCR is basic and organized; however, in
crucial indication in understanding lecture videos. regular scene pictures, there is undeniably less content,
Different stages are involved in the detection of such data and there exists generally structure with high variation
from images that are pictured by text detection methods. both in math and appearance. In light of these issues,
The goal of this paper is to study the existing text scene text pictures, for the most part, experience the ill
detection methods and reveal the best method among effects of photometric corruptions just as mathematical
them. This paper presents a deep learning-based mutilation, with the goal that numerous calculations
methodology known as CTPN that precisely confines text confronted the exactness or potentially speed issue. Hence,
lines in a common picture in the content discovery stage. the main errand while planning applications utilizing text
The CTPN operates dependably on multi-scale and multi- data in regular scene pictures is to change the picture such
language text minus any additional post-preparing, that current OCR tools can oversee.
withdrawing from past base-up techniques needing multi- The remnant of the paper is coordinated as follows;
step post filtering. The result of detected text has been section 2 gives brief related work on text detection
shown in this paper. modules. Section 3 gives an outline of the application
space for TD in picture handling and a comparative study
Keywords — Text detection, text localization, on different text detection methods. Section 4 portrays
Conventional approach, CTPN Algorithm. CTPN Algorithm with implementation, and 5 gives the
conclusion.
I. INTRODUCTION
Different applications in the field of picture handling and II. RELATED WORK
PC vision need to extricate the substance in a given Motivated by the great spectrum of latent applications,
picture. Content in a picture is basically isolated into two research topics on scene text detection and recognition
primary classes: a) semantic substance and b) perceptual have become very important [24], [25], [36]–[42] and
substance. Semantic substance adds to different articles, have newly observed a rise in research attempts in the
occasions, and their relations. It manages the connection computer vision group [26]–[31], [32], [33], [34], [35],
between words, expressions, signs, and images. In a basic [43]–[45]. Broad studies on text detection and recognition
way, semantic alludes to the investigation of significance, in still images and videos can be discovered in [46]–[48].
whereas perceptual substances are a substance that can be Bottom-up methods have been used for text detection
deciphered; such substance incorporates properties like in earlier works. “Connected-components (CCs)” based
tone, force, shape, surface, and their fleeting changes. methodologies and “sliding-window” based strategies are
Heaps of work have been accounted for on perceptual the two types typically collected. The CCs based
substance. Semantic picture content as text, face, vehicle, methodologies set apart text and non-text picture factors
and human activity are drawn in some new interest. In this by utilizing a quick strain. Text factors are put together
paper, the target has been given around semantic picture into a character field by utilizing low-level features, e.g.,
content as text since text inside a picture is specifically force, shading, angle, and so forth [17, 21, and 22].
compelling as it is extremely helpful for portraying the The sliding–window–based approaches recognize
substance of a picture. text fields by closely stimulating a multi-scale window
At the point when OCR is applied to regular scene over a picture. The text or non-text window is separated
pictures, the achievement pace of OCR drops radically by a pre-designed classifier by applying manually-
because of a few reasons [15]. To begin with, most OCR designed attributes [19, 20] or previous CNN highlights
tools are intended for checked content, which utilizes [18]. Different kinds of text indicators have been
division to isolate out text from background pixels. considered. Lately, that may be widely classified into two
Second, a unique assortment of text designs like sizes, styles, namely, “normal text detectors” and “deep
text styles, directions, colors, presence of background learning-based text detectors”. Normal content indicators
anomalies like content characters, like windows, blocks, influence unreal minor features and older information to
and character-like surfaces. At long last, page format for identify text and non-text portions in the picture. Though,
these methods lack the strength of different text writings Table.1 Text detection methods
and unclean pictures. To alleviate such an issue, much
deep learning-based content identification investigations Method Advantage Disadvantage
have been done, and they could accomplish best [14]. Correctly detect text in Slow
SWT[5][
various colors, sizes, performance.
III. TEXT DETECTION (TD) SYSTEM 7][8]
styles.
Text in pictures and video can be ordered into two classes: [11] If background
scene text and caption text. Figure 1 shows the instances Text detection can be or text is
of scene text and caption text. Caption text is unnaturally done independently of darker,
covered on the pictures or video outlines while editing. position and rotation information is
Even if a few kinds of caption text are more normal than needed (in the
others, it is seen that caption text can have self-assertive grayscale)
textual style, size, shading, direction, and area. For video
record, a caption text occasion might stay fixed and DTLN It recurrently Slow
inflexible after some time, or it might interpret, develop, [1][12] memorizes the previous performance
and change the tone. detection to avoid
repeated detection of
text at multiple scales
and efficiently catches
crowded text samples
in close proximity.
Once the text is detected from an image that can be IV. CTPN ALGORITHM
extracted in the form of normal text, apply a good OCR This paper presents a new Connectionist Text Proposal
method. Network (CTPN) this quickly localizes text alliances in
convolutional layers [14]. This solves various primary
impediments built by earlier bottom-up methods making
on text detection. This paper pulls the favors of solid deep
convolutional highlights and presents the CTPN
framework shown in Fig. 3. It forms the subsequent
significant commitments: Firstly, from the point of text
181
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022
Fig. 3 Framework of the (CTPN) Results have been produced on the "/data/res" folder. The
sample input image and its result are shown in below
The input image is being passed to the VGG-16 model figure 4.
[23]. Features yield from conv_5 layer of VGG-16 model
is obtained. A sliding window of size 3X3 is moved over
VGG-16 yield features and afterward taken care of
consecutively to the RNN network, which comprises
256D bi-directional LSTM. This LSTM layer is
associated with a 512D fully connected layer, which will
then generate the output layer.
A. Implementation
This paper implements the CTPN algorithm using its
GitHub repository to localize text areas in an image. We (a) Input (b) Output
have used Ubuntu operating system for this work. The
Fig.4 Text detection from lecture video image using
steps are as follows:
CTPN
a) Clone the Repository: Clone the CTPN GitHub Repo V. CONCLUSION
using the following command: This paper talk about two types of images, caption text
image and scene text image. This paper dealt with the
scene text image and presented a general-purpose text
detection (TD) system, which performs detection and
localization of the text in a given image. In this paper, a
b) Build the Required Library: Generate .so file for study on different text detection methods is discussed.
NMS (Non-max suppression) and bbox (bounding box) From the study of several papers, we found that the CTPN
utilities so that vital files can be loaded into the library. algorithm gives more accurate text detection compared to
other methods. The result acquired of text detection is
The existing directory has to be changed to "/text- truly exact and very usefull.
detection-ctpn/utils/bbox" using the following command:
182
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022
183
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022
[40] V. Wu, R. Manmatha, and E. M. Riseman, Textfinder: An [52] Neha Chaudhary and Priti Dimri, Enhancing the Latent
automatic system to detect and recognize text in images, IEEE Fingerprint Segmentation Accuracy Using Hybrid Techniques –
Trans. Pattern Anal. Mach. Intell., 21(11) (1999) 1224–1229. WCO and BiLSTM, International Journal of Engineering Trends
[41] Y. Zhong, H. Zhang, and A. K. Jain, Automatic caption and Technology,. 69(11) (2021) 161-169.
localization in compressed video, IEEE Trans. Pattern Anal. Mach. [53] Emmanoel Pratama Putra Hastono and Gede Putra Kusuma,
Intell., 220(4) (2000) 385–392. Evaluation of Blockchain Model for Educational Certificate Using
[42] H. Li, D. Doermann, and O. Kia, Automatic text detection and Continuous-Time Markov Chain, International Journal of
tracking in digital video, IEEE Trans. Image Process., 9(1) (2000) Engineering Trends and Technology, 69(11) (2021) 61-70.
147–156. [54] Ningxia He, Image Sampling Using Q-Learning, SSRG
[43] Y.-F. Pan, X. Hou, and C.-L. Liu, A hybrid approach to detect and International Journal of Computer Science and Engineering, 8(1)
localize texts in natural scene images, IEEE Trans. Image Process., (2021) 5-12,.
20(3) (2011) 800–813. [55] Walther Calvo-Niño, Brian Meneses-Claudio, and Alexi Delgado,
[44] C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, Rotation- Implementation of an FM Radio Station to Facilitate Remote
invariant features for multi-oriented text detection in natural Education in the District of Cojata, International Journal of
images, PLoS ONE, 8(8) (2013) 1-15. Engineering Trends and Technology, 69(10) (2021) 118-127.
[45] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, Accurate and robust [56] Osama R.Shahin, Zeinab M. Abdel Azim, and Ahmed I Taloba,
text detection: A step-in for text retrieval in natural scene images, Maneuvering Digital Watermarking In Face Recognition ,
in Proc. 36th SIGIR, (2013) 1091. International Journal of Engineering Trends and Technology,
[46] D. Chen, J. Luettin, and K. Shearer, A survey of text detection and 69(11) (2021) 104-112.
recognition in images and videos, Institut Dalle Molle d’ [57] Yasser Mohammad Al-Sharo, Amer Tahseen Abu-Jassar, Svitlana
Intelligence Artificielle Perceptual, EPFL, Martigny, Switzerland, Sotnik, and Vyacheslav Lyashenko, Neural Networks As A Tool
Tech. Rep. (2000) 1-21. For Pattern Recognition of Fasteners, International Journal of
[47] K. Jung, K. Kim, and A. Jain, Text information extraction in Engineering Trends and Technology, 69(10) (2021) 151-160.
images and video: A survey, Pattern Recognit., 37(5) (2004) 977– [58] Hoang Thi Phuong, Researching Robot Arms Control System
997. Based On Computer Vision Application And Artificial Intelligence
[48] J. Liang, D. Doermann, and H. Li, Camera-based analysis of text Technology, SSRG International Journal of Computer Science and
and documents: A survey, Int. J. Document Anal. Recognit., 7(2) Engineering, 8(1) (2021) 24-29.
(2005) 84–104. [59] Nilesh Yadav and Dr. Narendra Shekokar, SQLI Detection Based
[49] M Krishna Goriparthy and B Geetha Lakshmi, Balanced Islanding on LDA Topic Model, International Journal of Engineering Trends
Detection of Distributed Generator using Discrete Wavelet and Technology, 69(11) (2021) 47-52.
Transform and Artificial Neural Network, International Journal of [60] M. Ayaz Ahmad, Volodymyr Gorokhovatskyi, Iryna Tvoroshenko,
Engineering Trends and Technology, 69(10) (2021) 57-71. Nataliia Vlasenko, and Syed Khalid Mustafa, The Research of
[50] Shankargoud Patil and Kappargaon S. Prabhushetty, Bi-Attention Image Classification Methods Based on the Introducing Cluster
LSTM with CNN based Multi-task Human Activity Detection in Representation Parameters for the Structural Description,
Video Surveillance, International Journal of Engineering Trends International Journal of Engineering Trends and Technology,
and Technology, 69(11) (2021) 192-204. 69(10) (2021) 186-192.
[51] Vencelin Gino V and Amit KR Ghosh, Enhancing Cyber Security
Measures For Online Learning Platforms, SSRG International
Journal of Computer Science and Engineering, 8(11) (2021) 1-5.
184