0% found this document useful (0 votes)
18 views

Detection of Text from Lecture Video Images

Uploaded by

rhgoudar.vtu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Detection of Text from Lecture Video Images

Uploaded by

rhgoudar.vtu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Engineering Trends and Technology Volume 70 Issue 1, 180-184, January, 2022

ISSN: 2231 – 5381 /doi:10.14445/22315381/IJETT-V70I1P220 ©2022 Seventh Sense Research Group®

Detection of Text from Lecture Video Images


using CTPN Algorithm
Geeta S Hukkeri1, R H Goudar2
1
Research Scholar, Department of Computer science and Engineering, Visvesvaraya Technological University, Belagavi,
India.
2
Associate Professor, Department of Computer science and Engineering, Visvesvaraya Technological University,
Belagavi, India
1
[email protected], [email protected]

Abstract — Text data in lecture video images play a conventional OCR is basic and organized; however, in
crucial indication in understanding lecture videos. regular scene pictures, there is undeniably less content,
Different stages are involved in the detection of such data and there exists generally structure with high variation
from images that are pictured by text detection methods. both in math and appearance. In light of these issues,
The goal of this paper is to study the existing text scene text pictures, for the most part, experience the ill
detection methods and reveal the best method among effects of photometric corruptions just as mathematical
them. This paper presents a deep learning-based mutilation, with the goal that numerous calculations
methodology known as CTPN that precisely confines text confronted the exactness or potentially speed issue. Hence,
lines in a common picture in the content discovery stage. the main errand while planning applications utilizing text
The CTPN operates dependably on multi-scale and multi- data in regular scene pictures is to change the picture such
language text minus any additional post-preparing, that current OCR tools can oversee.
withdrawing from past base-up techniques needing multi- The remnant of the paper is coordinated as follows;
step post filtering. The result of detected text has been section 2 gives brief related work on text detection
shown in this paper. modules. Section 3 gives an outline of the application
space for TD in picture handling and a comparative study
Keywords — Text detection, text localization, on different text detection methods. Section 4 portrays
Conventional approach, CTPN Algorithm. CTPN Algorithm with implementation, and 5 gives the
conclusion.
I. INTRODUCTION
Different applications in the field of picture handling and II. RELATED WORK
PC vision need to extricate the substance in a given Motivated by the great spectrum of latent applications,
picture. Content in a picture is basically isolated into two research topics on scene text detection and recognition
primary classes: a) semantic substance and b) perceptual have become very important [24], [25], [36]–[42] and
substance. Semantic substance adds to different articles, have newly observed a rise in research attempts in the
occasions, and their relations. It manages the connection computer vision group [26]–[31], [32], [33], [34], [35],
between words, expressions, signs, and images. In a basic [43]–[45]. Broad studies on text detection and recognition
way, semantic alludes to the investigation of significance, in still images and videos can be discovered in [46]–[48].
whereas perceptual substances are a substance that can be Bottom-up methods have been used for text detection
deciphered; such substance incorporates properties like in earlier works. “Connected-components (CCs)” based
tone, force, shape, surface, and their fleeting changes. methodologies and “sliding-window” based strategies are
Heaps of work have been accounted for on perceptual the two types typically collected. The CCs based
substance. Semantic picture content as text, face, vehicle, methodologies set apart text and non-text picture factors
and human activity are drawn in some new interest. In this by utilizing a quick strain. Text factors are put together
paper, the target has been given around semantic picture into a character field by utilizing low-level features, e.g.,
content as text since text inside a picture is specifically force, shading, angle, and so forth [17, 21, and 22].
compelling as it is extremely helpful for portraying the The sliding–window–based approaches recognize
substance of a picture. text fields by closely stimulating a multi-scale window
At the point when OCR is applied to regular scene over a picture. The text or non-text window is separated
pictures, the achievement pace of OCR drops radically by a pre-designed classifier by applying manually-
because of a few reasons [15]. To begin with, most OCR designed attributes [19, 20] or previous CNN highlights
tools are intended for checked content, which utilizes [18]. Different kinds of text indicators have been
division to isolate out text from background pixels. considered. Lately, that may be widely classified into two
Second, a unique assortment of text designs like sizes, styles, namely, “normal text detectors” and “deep
text styles, directions, colors, presence of background learning-based text detectors”. Normal content indicators
anomalies like content characters, like windows, blocks, influence unreal minor features and older information to
and character-like surfaces. At long last, page format for identify text and non-text portions in the picture. Though,

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)


Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022

these methods lack the strength of different text writings Table.1 Text detection methods
and unclean pictures. To alleviate such an issue, much
deep learning-based content identification investigations Method Advantage Disadvantage
have been done, and they could accomplish best [14]. Correctly detect text in Slow
SWT[5][
various colors, sizes, performance.
III. TEXT DETECTION (TD) SYSTEM 7][8]
styles.
Text in pictures and video can be ordered into two classes: [11] If background
scene text and caption text. Figure 1 shows the instances Text detection can be or text is
of scene text and caption text. Caption text is unnaturally done independently of darker,
covered on the pictures or video outlines while editing. position and rotation information is
Even if a few kinds of caption text are more normal than needed (in the
others, it is seen that caption text can have self-assertive grayscale)
textual style, size, shading, direction, and area. For video
record, a caption text occasion might stay fixed and DTLN It recurrently Slow
inflexible after some time, or it might interpret, develop, [1][12] memorizes the previous performance
and change the tone. detection to avoid
repeated detection of
text at multiple scales
and efficiently catches
crowded text samples
in close proximity.

Works robustly in noisy


environments

Written in Keras, which It can detect a


EAST
helps to read and run single word
[2][13] easily.
Fig.1 Instances of caption text and scene text.
Correctly detect text in Sensible in
MSER
various colors, sizes, comparison
A good text detection framework should have the option [5][6][8] styles. with blur.
to deal with as wide an assortment of these sorts of text as [11]
conceivable to help a content-based indexing framework. Text detection can be
So it is advisable to apply a general-purpose text detection done independently of
system (TD), shown in Figure 2. position and rotation.
Performance is good.

Correctly localizes text Tilted text


CTPN
lines in an image. boxes are not
[3][4][9] Works accurately on detected
multi-language and accurately like
Fig.2 TD System [10][14]
multi-scale text using EAST.
no further post-
 An input image is sent to the TD system. The TD processing.
system has two parts to it, namely, text detection
and text localization. Best performance due
 The text detection part checks whether the text to the use of
exists in a given image or not. bidirectional LSTM
 The text localization part determines the location and text line
coordinates of the resided text in an image and construction algorithm
creates bounding boxes over the text.

Once the text is detected from an image that can be IV. CTPN ALGORITHM
extracted in the form of normal text, apply a good OCR This paper presents a new Connectionist Text Proposal
method. Network (CTPN) this quickly localizes text alliances in
convolutional layers [14]. This solves various primary
impediments built by earlier bottom-up methods making
on text detection. This paper pulls the favors of solid deep
convolutional highlights and presents the CTPN
framework shown in Fig. 3. It forms the subsequent
significant commitments: Firstly, from the point of text

181
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022

discovery into localizing a series of “fine-scale text


proposals” [16]. This paper progresses an anchor
regression system that mutually forecasts the text/non-text
count of every content proposition, bringing about Now build the library using the following commands.
amazing localization correctness. Secondly, the paper
presents an “in-network recurrence” method that richly
associates successive content recommendations in the
convolutional feature maps. This alliance lets the
detecting device investigate significant text line data,
making it incredible to recognize text reliably. Thirdly,
the two techniques are coordinated consistently to satisfy c) Test the model
the concept of text series, causing a cooperative end-to- Test the CTPN model by downloading the checkpoints
end compliant model. This strategy can deal with multi- (available in the GitHub repository). Use the below steps:
scale and multi-spoken content in a solitary cycle, keeping
away from additional post filtering or refinement.
1) Open the downloaded checkpoints.
2) Locate the opened folder “checkpoints_mlt” in
“/text-detection-ctpn” directory.
3) Place testing images in the "/data/demo/" folder, and
results will be produced in the "/data/res" folder.

Change directory to "/text-detection-ctpn" and run the


below command to test the input images.

Fig. 3 Framework of the (CTPN) Results have been produced on the "/data/res" folder. The
sample input image and its result are shown in below
The input image is being passed to the VGG-16 model figure 4.
[23]. Features yield from conv_5 layer of VGG-16 model
is obtained. A sliding window of size 3X3 is moved over
VGG-16 yield features and afterward taken care of
consecutively to the RNN network, which comprises
256D bi-directional LSTM. This LSTM layer is
associated with a 512D fully connected layer, which will
then generate the output layer.

A. Implementation
This paper implements the CTPN algorithm using its
GitHub repository to localize text areas in an image. We (a) Input (b) Output
have used Ubuntu operating system for this work. The
Fig.4 Text detection from lecture video image using
steps are as follows:
CTPN
a) Clone the Repository: Clone the CTPN GitHub Repo V. CONCLUSION
using the following command: This paper talk about two types of images, caption text
image and scene text image. This paper dealt with the
scene text image and presented a general-purpose text
detection (TD) system, which performs detection and
localization of the text in a given image. In this paper, a
b) Build the Required Library: Generate .so file for study on different text detection methods is discussed.
NMS (Non-max suppression) and bbox (bounding box) From the study of several papers, we found that the CTPN
utilities so that vital files can be loaded into the library. algorithm gives more accurate text detection compared to
other methods. The result acquired of text detection is
The existing directory has to be changed to "/text- truly exact and very usefull.
detection-ctpn/utils/bbox" using the following command:

182
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022

ACKNOWLEDGMENT [13] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang,


EAST: an efficient and accurate scene text detector, in Proceedings
The satisfaction and happiness that accompany the of the CVPR, (2017) 2642.
successful completion of any task would be complete [14] Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural
without the mention of people who made it possible, image with connectionist text proposal network, in Proceedings of
whose constant guidance and encouragement crowned our the ECCV, (2016) 56.
[15] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural
effort with success. scenes with stroke width transform, CVPR, (2010) 2963.
We are very thankful to Dr. Karisiddappa, Vice- [16] H. Lin, P. Yang, and F. Zhang, Review of scene text detection and
recognition, Arch. Comput. Methods Eng., 27(2) (2020) 433–454.
Chancellor, VTU Belagavi, who has supported, motivated, [17] Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural
and encouraged us to write this paper. image with connectionist text proposal network, in Proceedings of
the ECCV, (2016) 56.
We are indeed grateful to Dr. S L Deshpande, [18] Jaderberg, M., Vedaldi, A., Zisserman, A., Deep features for text
Chairperson, VTU-Belagavi, who has guided and spotting, in European Conference on Computer Vision (ECCV),
provided all the necessary guidance and support to (2014) 512.
[19] Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L., Text flow:
complete this work. A unified text detection system in natural scene images, in IEEE
We would like to express our special thanks to Mr. International Conference on Computer Vision (ICCV), (2015) 1.
[20] Wang, K., Babenko, B., Belongie, S., End-to-end scene text
Rohit Kaliwal and Dr. Mahantesh Birje, Centre for recognition, in IEEE International Conference on Computer Vision
Post-Graduate Studies, VTU-Belagavi, for their guidance, (ICCV), (2011) 1457.
support, motivation, and encouragement to write this [21] Huang, W., Lin, Z., Yang, J., Wang, J., Text localization in natural
paper. images using stroke feature transform and text covariance
descriptors, in IEEE International Conference on Computer Vision
Furthermore, We acknowledge the support of our (ICCV), (2013) 1241.
[22] Huang, W., Qiao, Y., Tang, X., Robust scene text detection with
parents, family, and friends convolutional neural networks induced mser trees, in European
Conference on Computer Vision (ECCV), (2014) 497.
[23] Simonyan, K., Zisserman, A., Very deep convolutional networks
REFERENCES for large-scale image recognition, in International Conference on
[1] Nils L. Westhausen1 and Bernd T. Meyer, Dual-Signal Learning Representation (ICLR), (2015) 1.
Transformation LSTM Network for Real-Time Noise Suppression, [24] D. Chen, J.-M. Odobez, and H. Bourlard, Text detection and
SPRINGER, 1-5 (2020). recognition in images and video frames, Pattern Recognit., 37(3)
[2] Niddal h. Imam, Vassilios g. Vassilakis, An approach for detecting (2004) 595–608.
image spam in OSNs, SPRINGER, (2020) 1-8. [25] M. R. Lyu, J. Song, and M. Cai, A comprehensive method for
[3] Langcai Cao, Hongwei Li, Rongbiao Xie, And Jinrong Zhu, A multilingual video text detection, localization, and extraction, IEEE
Text Detection Algorithm for Image of Student Exercises Based on Trans. Circuits Syst. Video Technol.,15(2) (2005) 243–255.
CTPN and Enhanced YOLOv3, IEEE Access, 8 (2020) 176924- [26] T. E. de Campos, B. R. Babu, and M. Varma, Character
176934. recognition in natural images, in Proc. VISAPP, (2009) 121–132.
[4] Xiangwen Liu, Joe Meehan, Weida Tong, Leihong Wu, Xiaowei [27] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural
Xu, and Joshua Xu, DLI-IT: a deep learning approach to drug label scenes with stroke width transform, in Proc. IEEE CVPR, (2010)
identification through image and text embedding, BMC Medical 2963.
Informatics and Decision Making, (2020) 1-9. [28] K. Wang and S. Belongie, Word were spotting in the wild, in Proc.
[5] Anurag Agrahari, Rajib Ghosh, Multi-Oriented text detection in 11th ECCV, (2010) 591.
natural scene images based on the intersection of MSER with the [29] L. Neumann and J. Matas, A method for text localization and
locally binarized image, Third International conference on recognition in real-world images, in Proc. 10th ACCV, (2010)
computing and network communications (CoCoNet’ 19), Elsevier, 770.
(2020) 322–330. [30] C. Yi and Y. Tian, Text string detection from natural scenes by
[6] Rituraj Soni, Bijendra Kumar, and Satish Chand, Text Region structure-based partition and grouping, IEEE Trans. Image
Extraction From Scene Images Using AGF and MSER, Process., 20(9) (2011) 2594–2605.
International Journal of Image and Graphics, 20(2) (2020) 1-23. [31] A. Coates et al., Text detection and character recognition in scene
[7] A. Vishnu Vardhan, Padarthi Maheeja, Manukonda SaiSudhanvi, images with unsupervised feature learning, in Proc. ICDAR,
Nuthalapati Preethi, Mannava Veena, Detection of Text by (2011) 440.
Enhancing Stroke Width Transform and Maximally Stable [32] Neumann and J. Matas, Real-time scene text localization and
Extremal Regions, © 2020 JCRT, 8(3) (2020). recognition, in Proc. IEEE CVPR, (2012) 3538.
[8] S.Shiyamala, S.Suganya, Detection, Localization of Text in Images [33] T. Novikova, O. Barinova, P. Kohli, and V. Lempitsky, Large-
by Mser and Enhanced Swt, International Journal of Innovative lexicon attribute-consistent text recognition in natural images, in
Technology and Exploring Engineering (IJITEE), 8 (2019) 2873- Proc. 12th ECCV, (2012) 752.
2875. [34] A. Mishra, K. Alahari, and C. V. Jawahar, Top-down and bottom-
[9] Wenxian Zeng, Qinglin Meng, Shuqing Zhang, Natural Scene up cues for scene text recognition, in Proc. IEEE CVPR, (2012)
Chinese Character Text Detection Method Based on Improved 2687.
CTPN, IOP Conf. Series: Journal of Physics: Conf. Serie, (2019) [35] A. Ikica and P. Peer, An improved edge profile based method for
1-7. text detection in images of natural scenes, in Proc. IEEE
[10] Chenhui Huang, Jinhua Xu, An Anchor-Free Oriented Text EUROCON, (2011) 1.
Detector with Connectionist Text Proposal Network, Proceedings [36] R. Lienhart and F. Stuber, Automatic text recognition in digital
of Machine Learning Research, (2019) 631. videos, Dept. Math. Comput. Sci., Univ. Mannheim, Mannheim,
[11] JIANG Mengdi, CHENG Jianghua, CHEN Minghui, and KU Germany, Tech. Rep, (1995).
Xishu, An Improved Text Localization Method for Natural Scene [37] Y. Zhong, K. Karu, and A. K. Jain, Locating text in complex color
Images, IOP Conf. Series: Journal of Physics: Conf. Series, (2018) images, Pattern Recognit., 28(10) (1995) 1523–1535.
1-8. [38] V. Wu, R. Manmatha, and E. M. Riseman, Finding text in images,
[12] Xuejian Rong, Chucai Yi, and Yingli Tian, Unambiguous Text in Proc. 2nd ACM Int. Conf. Digital Libraries, (1997) 3.
Localization and Retrieval for Cluttered Scenes, IEEE Conference [39] A. K. Jain and B. Yu, Automatic text location in images and video
on Computer Vision and Pattern Recognition (CVPR), (2017) frames, Pattern Recognit., 31(12) (1998) 2055.
5494.

183
Geeta S Hukkeri & R H Goudar. / IJETT, 70(1), 180-184, 2022

[40] V. Wu, R. Manmatha, and E. M. Riseman, Textfinder: An [52] Neha Chaudhary and Priti Dimri, Enhancing the Latent
automatic system to detect and recognize text in images, IEEE Fingerprint Segmentation Accuracy Using Hybrid Techniques –
Trans. Pattern Anal. Mach. Intell., 21(11) (1999) 1224–1229. WCO and BiLSTM, International Journal of Engineering Trends
[41] Y. Zhong, H. Zhang, and A. K. Jain, Automatic caption and Technology,. 69(11) (2021) 161-169.
localization in compressed video, IEEE Trans. Pattern Anal. Mach. [53] Emmanoel Pratama Putra Hastono and Gede Putra Kusuma,
Intell., 220(4) (2000) 385–392. Evaluation of Blockchain Model for Educational Certificate Using
[42] H. Li, D. Doermann, and O. Kia, Automatic text detection and Continuous-Time Markov Chain, International Journal of
tracking in digital video, IEEE Trans. Image Process., 9(1) (2000) Engineering Trends and Technology, 69(11) (2021) 61-70.
147–156. [54] Ningxia He, Image Sampling Using Q-Learning, SSRG
[43] Y.-F. Pan, X. Hou, and C.-L. Liu, A hybrid approach to detect and International Journal of Computer Science and Engineering, 8(1)
localize texts in natural scene images, IEEE Trans. Image Process., (2021) 5-12,.
20(3) (2011) 800–813. [55] Walther Calvo-Niño, Brian Meneses-Claudio, and Alexi Delgado,
[44] C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, Rotation- Implementation of an FM Radio Station to Facilitate Remote
invariant features for multi-oriented text detection in natural Education in the District of Cojata, International Journal of
images, PLoS ONE, 8(8) (2013) 1-15. Engineering Trends and Technology, 69(10) (2021) 118-127.
[45] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, Accurate and robust [56] Osama R.Shahin, Zeinab M. Abdel Azim, and Ahmed I Taloba,
text detection: A step-in for text retrieval in natural scene images, Maneuvering Digital Watermarking In Face Recognition ,
in Proc. 36th SIGIR, (2013) 1091. International Journal of Engineering Trends and Technology,
[46] D. Chen, J. Luettin, and K. Shearer, A survey of text detection and 69(11) (2021) 104-112.
recognition in images and videos, Institut Dalle Molle d’ [57] Yasser Mohammad Al-Sharo, Amer Tahseen Abu-Jassar, Svitlana
Intelligence Artificielle Perceptual, EPFL, Martigny, Switzerland, Sotnik, and Vyacheslav Lyashenko, Neural Networks As A Tool
Tech. Rep. (2000) 1-21. For Pattern Recognition of Fasteners, International Journal of
[47] K. Jung, K. Kim, and A. Jain, Text information extraction in Engineering Trends and Technology, 69(10) (2021) 151-160.
images and video: A survey, Pattern Recognit., 37(5) (2004) 977– [58] Hoang Thi Phuong, Researching Robot Arms Control System
997. Based On Computer Vision Application And Artificial Intelligence
[48] J. Liang, D. Doermann, and H. Li, Camera-based analysis of text Technology, SSRG International Journal of Computer Science and
and documents: A survey, Int. J. Document Anal. Recognit., 7(2) Engineering, 8(1) (2021) 24-29.
(2005) 84–104. [59] Nilesh Yadav and Dr. Narendra Shekokar, SQLI Detection Based
[49] M Krishna Goriparthy and B Geetha Lakshmi, Balanced Islanding on LDA Topic Model, International Journal of Engineering Trends
Detection of Distributed Generator using Discrete Wavelet and Technology, 69(11) (2021) 47-52.
Transform and Artificial Neural Network, International Journal of [60] M. Ayaz Ahmad, Volodymyr Gorokhovatskyi, Iryna Tvoroshenko,
Engineering Trends and Technology, 69(10) (2021) 57-71. Nataliia Vlasenko, and Syed Khalid Mustafa, The Research of
[50] Shankargoud Patil and Kappargaon S. Prabhushetty, Bi-Attention Image Classification Methods Based on the Introducing Cluster
LSTM with CNN based Multi-task Human Activity Detection in Representation Parameters for the Structural Description,
Video Surveillance, International Journal of Engineering Trends International Journal of Engineering Trends and Technology,
and Technology, 69(11) (2021) 192-204. 69(10) (2021) 186-192.
[51] Vencelin Gino V and Amit KR Ghosh, Enhancing Cyber Security
Measures For Online Learning Platforms, SSRG International
Journal of Computer Science and Engineering, 8(11) (2021) 1-5.

184

You might also like