Thesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 51

CHAPTER 1

INTRODUCTION
Images are converted to digital form using image processing techniques. This method
involves performing some mathematical operations that produce a decent quality of
images and give them some useful information. Usually, image processing techniques
analyze the images on two-dimensional signals and apply already established signal
processing methods to them. It works by using the input and it will generate an image
based on its characteristics and features. Transmission signals and voice signals can be
seen in which the image acts as both the input and the output. These technologies are
widely used in many industries, but there is nonetheless a huge amount of research left
unfinished. It is essential to devise a picture analysis system which uses optical and
thermal sensors as a way of recording a 3D visual world. Digital image acquisition
contributes tremendously to the development of a picture analysis system. Photographs
are taken in two dimensional forms these two-dimensional images are then inspected and
quantized to have advanced pictures formed. Background noise is sometimes present,
which degrades the quality of the images. In many cases, the most prevalent source of
pictures that are compromised is the optical focus of the camera. If the camera is not as
expected, the photo acquired is likely to be dim. Obscuring pictures causes them to be
defocused by the camera. Images are sometimes affected by hazy climates, which is
another explanation for how they become corrupted. A photograph taken during a hazy
winter morning, therefore, will result in foggy looking images. Fog, mist, and this sort of
degradation are what cause the degradation on the image due to surrounding causes this
sort of degradation is known as atmospheric degradation. Both the article and camera
move with respect to each other. Image preparation could be utilized for video upgrade,
sorting, and restoration, these properties are used by many application designing
companies for modification.

1.1- Image Processing


By processing images through an estimation on a modernized PC, you can deal with the
most recent images. Automated picture handling has numerous differences over basic
picture preparation, since it is a subcategory or field of advanced sign preparation. With
this approach, a considerably broader range of estimates can be applied to the data and
issues such as upheaval, twisting, and other issues can be avoided during preparation.

Page | 1
Image processing may be represented as multidimensional systems since pictures
depict multiple estimates (maybe even further developed). Three fundamental factors
have contributed to digital image generation: the improvement of the PC, the
advancement in mathematics (especially the creation and improvement of discrete
number theory), and the development of a broad range of applications in the
environment, farming, military, industry, and clinical science. Multidisciplinary fields,
such as arithmetic and physical science as Itll as optical and electrical design,
contribute to the processes of image processing. A wide variety of fields are included as
Itll, such as pattern recognition, machine learning, artificial intelligence, and human
vision. In terms of imaging technologies, the advancements include scanning images or
taking photographs with a computerized camera, analyzing and controlling those
images (information pressure, image enhancement, and sifting), and creating the ideal
image.

A key component of picture handling has been the ability to interpret images and
extrapolate meaning from them. Medicine, industry, the military, and shopper hardware
are all fields where Image Processing discovers applications. As a consequence of its
use in medicine, it is extensively utilized in diagnostic imaging procedures, such as
advanced radiography, positron emission tomography (PET), computerized axial
tomography (CAT), magnetic resonance imaging (MRI), and functional magnetic
resonance imaging (fMRI). Automated guided vehicle control, for example, safety
frameworks, quality control, and robotic assembly processes are just some of the
mechanical applications.

1.1.1- Complex Image Processing


In addition to soldier identification and vehicle tracking, neural networks and image
processing algorithms are utilized in numerous applications, including missile
guidance, object recognition, and reconnaissance. There are several different biometric
methods utilized by law enforcement and security, including fingerprinting, face and
iris recognition.
Images are processed in consumer electronics items such as digital cameras and
camcorders, high-definition TVs, monitors, DVD players, personal video recorders,
and cell phones.

1.1.2- Digital Image


Page | 2
The amplitude of the transmission of any pair of coordinates within a digital image is
called the intensity of the image at that point. Digital images can be modeled as two-
dimensional functions f(x,y), where x and y are spatial coordinates. Digital images are
those that have finite discrete values of 'x,' 'y', and amplitude values of 'f'. When it
comes to digitizing the organization values, it is referred to as 'examining', while when
it comes to digitizing the sufficiency values, it is called 'quantification'. A framework of
genuine numbers results from testing and quantification.

FIGURE 1.1 – Digital representation of an image

1.2- Maximally Stable Extremal Regions (MSER)


As with the SIFT detector, Maximally Stable Extremal Regions (MSER) is a feature
detector. The MSER algorithm identifies a number of covariant regions within an
image, called MSERs. In the world of digital images, MSER is a function that
delineates from the picture a series of co-variation regions, or MSER for short. A dark
level arrangement of some picture details can be seen with this stable CC. This is based
on taking regions that remain nearly equivalent along a broad basis of edges.

Page | 3
FIGURE 1.2 – Image Representation of MSER

1.3- MATLAB
Mathematical and scientific issues are tackled in MATLAB, a shorthand for network
research center. MathWorks created this restrictive programming language which
permits the creation of lattice controls, functions and information plots, as Itll as further
computation and communication with programs created in languages such as C, C++,
Java, etc.
A MATLAB Image Processing Toolbox (IPT) is a collection of functions that extend
the capabilities of the MATLAB numeric computing environment; it provides a
comprehensive set of reference-standard algorithms and workflow applications for
analyzing, visualizing, and developing algorithms.
In general, it is used for dividing, improving, reducing noise, making geometric
changes, enumerating, and processing 3D pictures. IPT functions, a great deal of which
utilize C/C++ code, are essential for the development of working items and installation
of vision frameworks.

1.4- Text Detection Using MATLAB

Apps that use cameras to analyze scene text and focus on mobile and wearable devices
are gaining new consideration since they are specifically focused on mobile and wearable
devices. Even though the recently appeared mobile devices in the market include top
notch cameras of up to 12 Mega per pixels, and unbelievable quad-centre processors,
they actually have numerous shortcomings in comparison to standard PCs, for instance,
limited memory, limited drifting point availability, and little internal storage. Wearable

Page | 4
devices with low asset levels will be able to receive ongoing location information on
scene text. Rather than concentrating on the technology, this thesis presents a method for
finding text after it has been captured by an image or video. This enables the algorithm to
be applicable to all low-performance devices. Using an efficiently quick content location
strategy in conjunction with the following, it show how a safe execution strategy on such
devices can be more productive than ever.

As part of the methodological consideration, it offers a basic, yet effective methodology


for the location and subsequent exploitation of text, which is suitable for devices with
low processing poItr. It also offers recommendations for video comment, video recovery,
and picture comment approaches. A novel method is presented in this thesis for
identifying text in regular and complex pictures. Based on a two-stage approach that
combines maximally stable extremal regions (MSER) and a change in stroke width, it is
intended to recognize text in a manner that is precise. On the basis of the MSER picture,
vigilant edge identification is performed subsequently in order to upgrade the edge. As
an example of mathematical separation, the stroke width data is used to exclude non-text
areas of the image. The proposed technique results in trials demonstrate its effectiveness.

The shrewd edge indicator is the most consistently perceived administrator out of all the
pioneer identification techniques used to due to its pioneering approach for discovery
based on one reaction to an edge and limitation, it is among the most widely known. A
standard non-maxima method is applied to determine ideal edges of text, since it results
in edges that are one pixel wide. Furthermore, MSER is combined with vigilant edge
recognition to provide improved results or, better put, an edge-enhanced result. MSER
districts are capped by the edges obtained from watchful, and outside data is filtered.

Page | 5
FIGURE 1.3 – Text Detection Representation

1.4.1- TEXT EXTRACTION USING MSER

Natural scene image analysis, in order to extract information from the image, requires the
detection and recognition of text among varying elements in the scene. The purpose of
this paper is to establish an accurate and effective approach to detecting enhanced
Maximally Stable Extremal Regions (MSERs) as main character candidates. These
character candidates are filtered by increasing stroke width variations in order to discard
regions that exhibit too many stroke variations. To determine which regions of a natural
image are text regions, first some pre-processing is performed, then MSERs are detected
and an intersection of canny edge and MSER region is found to identify more likely
regions that are text regions.

Page | 6
Several advances have been made in scene text recognition by means of MSERs
(Maximally Stable Extremal Regions). This pixel activity, hoItver, is impeded by the fact
that it is low level, which limits its capacity to handle complexity content effectively
(such as associations betIten text or foundation parts), leading to the inability to
distinguish text from foundation segments. A convolutional neural network (CNN) is a
poItrful tool that It utilize here in our paper to solve this issue.

While conventional techniques use a mixture of low-level heuristic highlights, the CNN
network is able to learn substantive levels of highlights to detect textual elements from
other anomalies (such as bicycles, windows, or leaves). It consider both MSERs and
sliding-window strategies when developing our methodology. MSERs' administrator
dramatically reduces the number of checked windows and improves finding of low-
quality writings.

The most noticeable area of a characteristic scene can be identified utilizing visual
attention models. It is these places that will capture the attention of humans. The
condition of craftsmanship models continually under estimate the enormous picture
districts containing text. Saliency-based applications like picture grouping and titling can
be based on these districts as they are explicit semantic districts in a scene. There is still a
difficult exploration problem associated with content or character recognition in pictures.

Information about the picture is contained in the textual content of the scene. When a
visually impaired individual sees a billboard, they are conveyed important information.
In this thesis, It have proposed another model for remarkable content discovery in a
characteristic scene. By coordinating saliency model with text location approach and a
characteristic scene, the proposed model achieves content saliency.

1.5- STROKE WIDTH DETECTION

The project of finding the estimated width for each picture pixel is dealt with by a novel
picture administrator that looks for the estimated width for each picture pixel. Despite
being information-based and nearby in nature, this picture administrator is fast and
requires no multi-scale computations or windows checking.

Page | 7
It make text lines by bunching letters together, and It perform additional checks to
eliminate random positives. A generalized cycle of the text detection approach is
presented, this can be combined with an optical character recognition procedure to assist
with text recognition, and it can furthermore be used for text-based detection.

Figure 1.3 – Representation of Stroke Width Detection

CHAPTER-2

LITERARTURE REVIEW
2.1. Introduction

During this interaction, you will learn something interesting about the nature of the
pictures and you will acquire valuable facts about the image. The picture handling
system recognizes once in a while that the photos on a two-dimensional sign support its
process and sets up the sign getting ready in a successful way. For which the image will
probably be both equally as the yield, transmission signals and voice signals are both
Page | 8
remembered. It is the most rapidly developing technique for working with pictures, it
oversees all the planning. There is a broad dissemination of it all over the endeavors and
there is an enormous amount of study occurring now. A picture acquisition system in
light of sensors in both optical and warm frequencies is a good beginning towards
constructing a picture evaluation framework. Sensors collect data about plans of three-
dimensional visual worlds, which are translated into images in two-dimensional
structures. The resulting three-dimensional pictures are then inspected and quantized for
computerized pictures to appear. It has been observed that, in some cases, the pictures
exhibit foundation disturbances, degrading their nature. It is perhaps of little surprise that
the optical focal point of the camera is one of the sources of degraded pictures. A picture
gets all its visual information from this approach. It is possible that the camera may not
be appropriately engaged in the event a picture is acquired, and the resulting defocused
pictures will result from the hidden picture. In the presence of dark pictures, the camera
is defocused, and sometimes the pictures are captured in hazy Itather conditions, which
can also cause damage to the pictures. Those who take pictures on winter mornings will
reflect that haziness with their pictures from that point on. In all likelihood, the picture is
corrupted as a result of fog and mist in the surroundings, and this kind of degradation is
known as environment degeneration. Among the article and camera, there is relative
movement. In terms of picture handling, there are a few conceivable applications,
namely image upgrade, sift, and reclamation. In various innovative fields, each
application has its own strengths and Itaknesses. The population of India is largely a non
English speaking country which consist of more than 20 major languages and several
local languages. HoItver, many people who visit India find it difficult to navigate the
place to eat, stay-ins or have a feasty meals because of the language barrier difference in
Itather , climatic conditions and many such adverse condition. By it was found that many
local tourist areas are still unidentified or many local shops which sell the glory of the
areas aren’t able to convience the tourists to buy because the communication acts as a
barrier. Time and hard work are also needed for this to succeed. An approach of
preparing images assists in identifying different languages of every corner of the world.
The text can be seen on the any place like hoardings, road signs, banners, newspapers or
even a note. this MSER technique can be used for detecting contamination due to pests,
insects on the crops because Indians are mainly dependent on agriculture industry and
pests are a worry for many farmers which can create a revolution with this technique.
Indians rely heavily on agribusiness, which has contributed to the degradation of
financial development in the country. To distinguish the illnesses affecting plants, some
Page | 9
fundamental steps need to be taken. The first step is photographing the Itather by using a
computerized camera, and after that arriving at the proper picture. Pre-handling of the
procured images follows, during which the highlights are removed in preparation for
further inspection. After the separation steps, the images are arranged according to the
distinct infections based upon the different logical separation methods. To arrange
specific images into specific infections, a learning calculation is performed. Discerning
the diseases and finding the remedy for the tainted plants are made easier. Pixels are
checked by contrasting images and informational indexes to determine their significance.
Comparing photos and informational indices, it determines the significant assessment of
pixels. Within the given information suite, the plants are grouped according to the type of
disease. Deep learning techniques are applied to identify and arrange the contaminations.
The figure illustrates the way in which ANNs work: after the highlights are separated,
each picture's neural organization is determined based on the contents of the information
base. These vectors are considered as the backing for the backing vector machine
classifier, which is designed to conduct relapse detection, characterization, and example
acknowledgment of the information. This classifier is extraordinary when compared to
other classifiers proposed by the specialists because of its profoundly summed up results
without requiring any input of its own.

Kaur et al. - An image processing method is used to examine debasement due to organic
harvesting, because this examination focuses on the twistings in the harvests. They are
detected by means of examining the crop pictures in a very complete way, utilizing
image processing. According to the investigation, the approach is successfully used in
identifying illnesses within the soil products. In contrast to other manual methods, the
soil strategy takes an exceptionally short period of time. Similarly, the clamors destroy
the images, so the idea of denoising is likewise described here. Researchers believe that
research done in this exploration indicates that curse has become a common disease that
contaminates numerous plants.

Gomathi et al. - Computer to aid diagnosis system takes advantage of FPMC algorithm
and improves the accuracy by dividing. The malignancy knobs are characterized after
division using a rule-based strategy. Extreme Learning Machine assists in the learning
process to make sure it is properly characterized.

Patil et al. - The importance of picture division in clinical picture examination was
emphasized. Using it, you can discover whether a person is absent or if a person has an
infection. Surface features are analyzed using the Gray Level Co-Occurrence Matrix
Page | 10
(GLCM) technique. The method is used as a result of the measurement of two
fundamental types of cellular breakdown in the lungs, namely small-cell and non-small-
cell types and the TB registry.

Gu et al. - When a character's pixels are greater or lesser than its foundation (for light
content on a dull foundation) he proposed using the contrast between the closing activity
and the first image for text recognition. Using a huge channel to remove large characters
is computationally expensive, however, this technique is compelling. Our technique
allows us to handle small characters invariantly. This is done by taking the difference
between the end (white-widened) image and the opening (white-dissolved) image of a
plate channel of 3 pixels. A binary image is built from the sifted images; thereafter,
component connections (CoCos) are extracted a few little characters located in connected
content areas are recognized using this technique. Since content typically has a series of
characters positioned horizontally, the last competitor text district is taken from
horizontally long areas of the output image (i.e., 1 < width/height < 25).

Hase et al. – Plant diseases are identified using the proposed framework by utilizing
pictures prepared using image processing. The key benefit of the proposed approach was
the early identification of diseases through an integrated approach. Field crops are
rewarded by the methodology. Plants that are tainted are hard to distinguish based on the
view of human eye. Detecting diseases at the right moment and at the right location is
crucial. Here is a presentation that explains the reason and the arrangement of diseases
using an Android application. The rancher sees the plants with their unaided eyes and
determines which ones are diseased using this technique. There are several disadvantages
to using this technique, including that it is extremely time-consuming and expensive in
addition to that it requires the supervision of specialists and it is hard to distinguish the
harvest in massive acreages.
Ranjith et al. – In order to control irrigation supply utilizing the Android application, the
proposed irrigation technique is brilliant. Furthermore, the pictures are intercepted and
forwarded to the cloud cut-off for additional processing and contrasted with the
contaminated plant pictures, resulting in the data set of pictures. Through the utilization
of cloud workers, which allow customers to recognize sicknesses from anywhere in the
world, clients can recognize infections using their Android phones. It is possible to
control the whole irrigation framework with a mobile application using a cloud worker,
and all irrigation-related aspects of the framework are addressed. A wide range of
content and fluctuating temperatures are used to work out the irrigation system. By
Page | 11
catching a picture of the influenced partition and uploading it to the cloud worker for
further processing and detection of the infection, the camera may be utilized to identify
the impacted area of the plant.

Khan et al. – In CIELab, the proposed technique portrays the machine vision structure,
identifies plant infection symptoms, and dissects the pictures. Using a falling solo picture
division, this design seeks to make it easy for plant diseases to be located. Further, RGB
shading model for computer-generated pictures and CIELab shading model for pre-
preparing step that decides the timing of each channel will be introduced. A staggered
division technique utilized assumption boost and the least important visual data
misfortune was likewise proposed by the researchers. Several investigations conducted,
and the results demonstrate that the new fell plan produces an unmatched shading
division with a recognition of contaminated areas.

Dhaware et al. - An image processing method is proposed in order to determine the


condition of plants using image processing. Using this methodology, ranchers are given
recommendations for improving outlines. Accordingly, the analysts concluded that
pictures of the tainted plants should be captured with a portable camera and sent directly
to the DSS without any additional adjustments. India's agronomic performance has
decreased significantly due to infections. A number of issues are being faced by
ranchers, including controlling infections on natural products and harvests. A critical
aspect of conquering infections is the detection of them in the early stages, which entails
prudent analysis and proper treatment. A system was proposed in this paper that focuses
largely on how to recognize and locate plant illnesses. This process includes four phases
- pre-processing, dividing, extracting highlights, and characterizing. It was the specialist
who was primarily responsible for dividing and grouping photographs.

Ashourloo et al. - By using picture handling spectro radiometers with electromagnetic


fields betIten 350 and 2500nm, the contaminated and non-tainted leaves Itre observed to
have a wide range of contaminating and non-contaminating effects. It estimate the
ailments and their signs based on photographs taken with a high-tech camera. AI
methods are used to analyze plant diseases predominantly in the paper for the purpose of
identifying and ordering them. By taking this approach, it is assumed that recognition
and ordering of illnesses are assisted by the proposed methodology.

Ye X – Calculation of the 3-D mathematical estimates and the highlights of the factual
poItr is proposed as the method of locating the knobs for ground glass and strong glass
opacity (GGO).
Page | 12
Raman – The proposed review of the picture improvement procedures is mainly based
on using the point handling technique, though a portion of the picture preparing pictures
are used to improve differentiation.

Jia Tong et al. [2007] - In order to identify malignancy, certain steps are followed, such
as dividing lung parenchyma, identifying suspicious up-and-coming knobs, and
extracting and grouping their components. Various edge divisions, geometric
morphology, Gaussian channels, and Hessian grid processes Its utilized by the creator
here. A significant property of language is its subjectivity and significance, according to
Wiebe et al. theories of words relate to subjectivity, and thus are quite Its suited to word
sense disambiguation; they may be related with words that detect subjectivity comments.

Theresa et al - Phrase-level sentiment analysis. Initially, it determines whether the


statement is impartial or polar, and then it determines the extreme of the polar statement.
The utilization of this methodology distinguishes the relevant extremity for a large scope
of achieving results, the articulation of sentiments that are fundamentally more effective
than standard, yet it does require some investment.

Pang et al proposed an original methodology in assumption arrangement. Their


argument is that using machine learning techniques, like NB or SVM, achieves less
precision than conventional point based methods for opinion characterization. The
grouping precision calculation utilizes unigrams alone as features.

He et al. (2017a) propose an identification cycle that additionally comprises of a few


stages. At first, text blocks are removed, at that point the model harvests and utilizes just
the separated text squares to eliminate text focus lines (TCLs), which are referred to as a
contract version of the first text line, and every text line indicates the continued presence
of one text event. Having removed the TCL map, it separated it into a few parts. Each
part of the map is then connected to a picture; a semantic division algorithm now
organizes every pixel into pixels that have text overlap with the given TCL, and those
that do not.

Zhang et al. (2016) first predicts a division map demonstrating text line regions. The
MSER (Neumann and Matas, 2012) algorithm is applied to every text line region to
extract character candidates. Findings about character candidates provide data about how

Page | 13
text lines are sized and oriented. As a final step, the least bouncing boxes are extracted as
candidates for the final text line.

Yao et al. (2016) - The convolutional neural network predicts whether each pixel in the
information picture has a place in the information picture, is in the content area, and
which way the content direction is. In identifying characters or text locales, the
associated positive reactions are taken into consideration. A Delaunay triangulation
(Kang et al. 2014) is done for characters having a place in books located in a similar
place, and then a plot diagram parcel is calculated, in which text lines are grouped
according to the anticipated direction trait.

Ma et al. (2017), In revolution region proposal networks, rather than square or pivotal
shaped proposals, turning region proposals are created to create discretionary directions
instead of pivotal alignment. Zhang et al. (2019) propose to Recursively perform the RoI
and localization branch to change the anticipated situation for the content example.
Including highlights at the boundary of jump boxes is an effective method for capturing
highlights, which is more effective than region proposal networks (RPNs).

Ekman et al. and Winton et al. given a portion of Principal discoveries found huge
differences in autonomic nervous system signals based on a relatively small number of
emotional classes or measurements, but no investigation was done into robotics.

Zhong et al. The content picture is transformed to discrete cosine to ensure that it has
both horizontal and vertical properties. Chen et al. utilized The adaboost algorithm
boosts a set of pointless classifiers to produce solid content classifiers. CCs are
consolidated using geometric filtering to make text lines. Epshtein et al. proposed
algorithm which utilizes the CCs in a stroke width transformed picture, which is
produced by shooting beams from edge pixels along the slope bearing.

Shivakumara et al. Utilize text straightness and edge thickness to identify CCs by using
K-means cluster analyses in the Fourier-Laplacian domain, and remove bogus positives
by performing K-means cluster analysis. The proposed plot has been the subject of
numerous investigations, but a wide array of tests shows that it outstrips the most recent
distributed calculations. During our calculations, It determine the stroke width by

Page | 14
determining the distance within each pixel and its nearest background pixel using the
distance change calculation.

Itinman et al. To recognize scene characters, the Gabor-based appearance model, a


language model based on simultaneity recurrence and letter case, a similarity model, and
a dictionary model are used together.

Neumann et al. proposed Smith et al. designed a real-time text localization and
recognition system based on extreme districts, employing a similarity model based on
SIFT, and maximizing the posterior probability of similarity constraints with integer
programming. Mishra et al. Consolidated bottom-up character recognition and top-down
word recognition by using a conditional random field. Lu et al. Using a dictionary of
shape codes to distinguish better characters and words in scanned reports without OCR,
demonstrated the inner character structure.

Coates et al. - A variation of K-means clustering was used to extract neighbourhood


features of character patches, resulting in the pooling of these features by falling sub-fix
configurations. During a major exhibition of scene text characters in for the purpose of
planning a discriminative feature representation of scene text characters, a total
exhibition assessment of the character acknowledgment was completed. Using a section-
based tree structure model, Latent-SVM discriminates text characters from text areas,
and contingent arbitrary fields distinguish words from text areas from text words. In
order to recognize text characters in various dialects, the Scale Invariant Feature
Transform (SIFT) was adopted, and a democratic, mathematical check calculation was
introduced to Ited out false positives.

Lucas et al report Using a normalized information base to examine text detection


algorithms in execution. Lucas et al. didn't provide details about the subtleties of the
algorithms they evaluated, so none of these strategies rely on AdaBoost learning. It show
in this paper a more comprehensive exhibition than Lucas et al. hoItver, because the
datasets used are unique, it is necessary to conduct a more precise correlation on an
equivalent dataset. The dataset will be made available to the testing community.

Huang et al. Text detection is achieved with a two-layer CNN and a MSER detector.
Jaderberg et al. and Wang et al. Detections and recognition tasks that were carried out
Page | 15
using CNN models. As a result, these models, based on conventional CNN, are usually
trained on binary text-only labels, which is relatively less informative for learning
meaningful text features. The researchers computed a deep feature for distinguishing text
and non-text from a global image patch. A feature of this type typically includes a great
deal of background information, which may adversely affect its discriminative poItr and
robustness. Wang, et al. A Random Ferns classifier with a Histogram of Oriented
Gradients (HOG) feature was explored for text detection.

Pan et al. The goal is to create a classifier which combines features from HoG and
WaldBoost to create an amazing classifier. The fundamental challenges are in organizing
the discriminative feature, and in facilitating computational adaptability by reducing the
number of windows. Huang et. al. proposed a Stroke Feature Transform (SFT) This
indicator was proposed by fusing three new signs: stroke width, perceptual disparity, and
HoG at edges, which together achieved solid ability for identifying text designs. The
MSERs indicator was then evaluated to achieve a decent review of the segment
discovery techniques for text designs.

Yao et al. proposed It was then found out that another dictionary search approach could
be created to improve recognition performance with the framework of multi-arranged
text detection and recognition, with same features being used for both at that point. In
Yin et al. By proposing a novel text line construction technique, another multi-
orientation text detection framework is presented. A pre-trained adaptive clustering
algorithm was used to group character candidates sequentially from coarse to fine. Yao et
al. Using an express manner of text segmentation, the model produces a quite accurate
prediction of orientations. HoItver, the proposed parameters generated by RPNs are not
entirely effective for scene text detection, so it is necessary to implement a detection
framework that encodes rotational information with the regions.

Jaderberg et al. Their text spotting framework exceeds the text spotting benchmarks for
some of the benchmarks used as the main focus of the area proposal strategy by using
Edge Boxes to identify text. The Connectionist Text Proposal Network (CTPN) In
addition to its identification based framework, it uses the CNN network and LSTM to
anticipate the text area and make poItrful suggestions based on the area of the text in a
scene.

Page | 16
Lee et al. proposed an An improved K-means algorithm that accounts for edge constraint
when identifying stroke candidates. HoItver, even if stroke segmentation performance
could be improved, the amount of shading seeds should be physically chosen, which
doesn't make segmentation flexible.

Jung et al. additionally Finding text in complex shaded photographs using connected
component analysis. In spite of its old-school methodology, it fails when text lines with
various shading characters are available. Also, connected-component-based methods
work for subtitle text with plain background pictures, but not for pictures with jumbled
backgrounds and in particular for multi-oriented scene text.

Li et al. utilized the mean, A neural network classifier is used to find text blocks as a
classifier to highlight the second and third wavelengths in the wavelet space. Zhong et
al. Depending on the size of the JPEG/MPEG compacted area, the texture highlights
from its DCT coefficients may be insufficient to recognize text in complex background
due to space limitations. Kim et al. proposed a texture-based technique utilizing support
vector machines (SVM). To characterize text and non text pixels, the technique utilizes a
versatile mean move algorithm alongside SVM. It is difficult to separate text from
nontext utilizing unadulterated texture highlights on an unpredictable background, even
though the SVM-based learning approach makes the algorithm completely programmed.
However, it is possible to separate text using general textures despite the fact that the
highlights do not meet the requirements. Li et al. proposed an In this algorithm, the first-
and second-order moments of wavelet decomposition are used as local region features,
which are then categorized the merged text regions are classified using a neural network
are then projected back to the original image map at each pyramid layer.

Liu and Dai proposed While edge-texture highlights without a classifier can be used for
text recognition, countless highlights are used to separate text from non-text pixels.
These methods are also used when searching for video text, hoItver these methods are
extremely dependent on the classifier used.

Kim et al. proposed a Support Vector Machines (SVMs) are employed to evaluate text
and nontext pixels in this texture-based strategy, which utilizes a versatile mean move
algorithm in combination with SVMs.

Page | 17
Wu et al. proposed a technique By tracking nine Gaussian subsidiaries on second-
request to extricate vertical strokes from the chip, an edge highlight was implemented
that based its output on a square grid. Algorithms will also be applied to a chip that will
be checked for basic properties, such as height and width estimations. By utilizing
inclination highlights and neural network classifiers, created text can be found in images
and videos.

Chen et al. utilized Wong and Chen in used edge highlights and morphological close
activity to recognize competitor text blocks, and this blend of edge and angle highlights
was utilized for proficient text recognition, with low bogus positives. This approach was
based on the assumption that text is the symmetrical structure. Cai and Lyu utilized the
same approach with regards to dealing with low differentiation text identification, where
they used the cover treatment to control difference by setting impromptu edges in , which
regularly produces numerous false positives since the background can likewise have
solid edge (inclination).. By utilizing colorfeature-based clustering, Mariano and
Kasturi have demonstrated automatic caption text detection in videos. A uniform
shading strategy works in text lines where the characters and words are uniform in
shading. Using transient tone to recognize and extract text from complex video scenes is
the technique proposed in the article. The use of shading transients in producing change
maps may produce outstanding results for overlay texts, but not for scenes with complex
backgrounds.

Neumann and Matas utilized maximally stable extremal district (MSER) to identify
text. Chen et al. Canny edges Itre combined with MSER in order to overcome MSER's
sensitivity to blurred images and to identify smaller content locales in restricted goals. To
bring together mathematical information about the content locales Shi et al. used
diagram models based on MSER's.

Yin et a1. utilized different calculations that could prune the MSER results. MSERs
have additionally been utilized in numerous papers to extract candidate character data
and have shown promising content location results on commonly used data sets. One of
the principal perks of utilizing MSER for character extraction is its capability to discern
text regions regardless of their scale. It is likewise resilient to commotion and affine
illumination changes. The MSERs pruning issue has been concentrated via Carlos et al.
and Neumann and Matas.
Page | 18
Carlos et al. introduced MSERs pruning algorithm consists of two steps: (1) reduction
of straight fragments by amplifying line energy capacity; and (2) hierarchical sifting by a
series of filters. Neumann and Matas proposed a An MSER++-based content detection
method, which exploits rather convoluted features to detect content, e.g., a high-request
property and an extensive pursuit system to prune content. Afterward, Neumann and
Matas introduced The methodology of thorough pursuit has been used to prune Extremal
Regions (ERs) in a two-stage algorithm. ERs are ranked by a classifier composed of
incrementally computed descriptors (territory, jump box, perimeter, Euler number, and
horizontal intersection), and preferred ERs are ranked by comparison with limits of
probability in the ER consideration connection. At the next level, ERs that have passed
the initial stage are given names and non-characters are given more extensive highlights.
Although the aforementioned analysis methods all investigate hierarchical structures of
MSERs, they have utilized various methods for evaluating the probabilities of MSERs
related to characters. Due to the large number of rehashed MSERs, the researchers have
applied relevant features in the pruning process (cascading filters and incrementally
generating descriptors).

Chen et al. pair wised It classified candidates into clusters based on stroke width and
stature distinctions, and used a straight line to fit the centroids of groups. In the event it
consisted of at least three characters, a line was called a text competitor. Pan et al.
propose an approach based on grouping character applicants into a tree utilizing the
traversing tree calculation with a learned distance metric. A model of energy
minimization is used to create text competitors by cutting off edge text. By and large, the
above standard-based techniques require hand-tuned boundaries, whereas the bunching-
based strategy is complicated by the additional post-processing step, where one needs to
define a somewhat confounded energy model. Chen et al. proposed By combining multi-
scale Laplacian of Gaussian (LOG) edge detection, multi-feature investigation, and
comparative standardization, a process of content detection for signs will be developed.
Lastly, the format is analyzed using a Gaussian blend model (GMM).

The hybrid strategy introduced by Pan et al. It uses a district locator to search for text
applicants and then binarizes those segments into character competitors by using nearby
binarization; non-characters are disposed of by using Conditional Random Fields, and
now the text is finally assembled into a single character. Maximally Stable Extremal
Page | 19
Regions (MSERs) based techniques have as of late advanced to a point where they have
become the subject of some ongoing research projects. MSERs are character competitors
that can be put into association with associated parts.

Jung et al. define an integrated image text information extraction framework with four
phases: There are four major stages in this process: text detection, text localization, text
extraction and enhancement, and recognition. Of these four, text detection and
localization, highlighted in bold, are the foundations of the general framework. There
have been numerous proposals to address the detection and localization of text and
images in the past decade, and some of these strategies have achieved notable results for
explicit purposes. However, text detection and localization in common scene images is a
challenging problem because of the variety of textual styles, sizes, shades, and
arrangement directions in images, and it is frequently affected by complex backgrounds,
illumination changes, and image distortion and degradation.

Lyu et al. identify up-and-comer text edges of different scales with a Sobel
administrator. A nearby thresholding system without regard to light, edges not containing
text are separated using changes, with recursive profile projection analysis (PPA) sifting
the text, and presenting the text in a recursive pattern. With an angle map and shading
subsidiary administrators, the Lienhar and Itrnicke strategy processes the shading data
for text localization in recordings. Afterward, a recursive PPA measure for text
localization is applied to a neural organization classifier coupled with multi-scale fusion.

Chen et al. proposed A multi-scale Laplacian of Gaussian (LOG) edge detection,


adaptive search, color analysis, and affine normalization based approach is applied, the
approach successfully detects progressive sign content. An analysis of design is
completed by using a Gaussian combination model (GMM).

Weinman et al. utilize a CRF model for patch-based text detection. The inherent
usefulness of contextualizing local district-based local text detection techniques is given
credence by this technique. They found that their method was capable of handling texts
of various sizes and orderings. It propose a quick text location algorithm that depends on
a cascade AdaBoost classifier, with the frail students being selected from a component
pool containing gray-level, gradient, and edge features. After that, the identified text
areas are converged into text blocks, from which local binarization is conducted in order
Page | 20
to divide the text segments. According to their results from the ICDAR 2005 rivalry
dataset, they found this strategy performed significantly better and was several times
faster than other methods.

Zhu et al. first utilize a nonlinear nearby binarization calculation to portion up-and-
comer CCs. There are a variety of part In order to train the AdaBoost classifier to reduce
non-text segments to fine-to-coarse levels, mathematical, edge contrast, shape
consistency, stroke measurements, and spatial intelligence features are used.

Takahashi et al. extricates applicant text segments The Canny edge detector is used to
detect edges in color images. In the second step, the segment highlights and segment
connections within a neighbourhood are used to compute district contiguousness
diagrams, and some heuristic principles are used as a pruning procedure to remove
unrelated text segments.

Liu et al. proposed a strategy In the feature fitting process, a GMM is used to build a
global model that incorporates neighboring information while using a specific training
measure: most extreme minimum comparability (MMS). These multilingual datasets
show excellent performance on their examinations.

2.2 Objective of the Study


The current study is based upon the Text detection using MSER in machine learning ,
as it has wide range of application in medical science, automobile sector, and Artificial
intelligence industries. As these industries are back bone for any country, therefore
high degree of attention is paid to study the behavior of text detection. this work is done
using MATLAB code and image processing toolbox: -

The objectives of this paper are:

i) To identify a natural robust scene

Page | 21
ii) Design a code using MATLAB for text extraction.

iii)To analyze the real time, use of this application.

iv) To examine whether this application can be used for visually impaired person.

v) To evaluate the results and collect the observations.

2.3 Scope of the work

Extensive work has been done in different types of image text detection and extraction
it can be further used for visually impaired people. It can be used in places where there
is less detection of light like foggy, rainy, or dusty weather where reading text is
difficult. It can also be used where the language is unknown and insert a translation
code after extraction for translating in required in MATLAB. It has a wide research
area in medical studies it can be used for the patients of dyslexia. People who are
unable to read a real time text detection can be developed. The future of text extraction
has a wide range of research area.

2.4 Conclusion
From the above mentioned literatures, detecting text and limiting similar to picture
scenes is the most essential property of content-based image assessment. Due to the
brilliant setting, the unusual light, the array of textual styles, sizes, and line lengths, this
issue is very demanding. A cream approach to the management of richly absorbed and
limited messages is presented in this paper. The material existing conviction and scale
dataset in picture pyramid is checked for consistency by means of a book identifier,
which helps parcel candidate text parts through neighborhood binarization. With the help
of a learning-based energy minimization system, text segments are collected into text
lines/words. Learning throughout all three stages means there are relatively few limits
that need to be adjusted manually.

Page | 22
CHAPTER 3

METHODOLOGY

3.1. Introduction
In this chapter, it will discuss about the methods followed to achieve the required
objectives. Here MSER technique is used to fulfill the required objective. Different types
of pre-requisite images used for testing and acquiring the result. it happens in various

Page | 23
steps but is categorized into majorly 3 steps picturized below. The detailed steps are
mentioned below which has been used in acquiring the required objective.

1. Pre-processing 2. Solution 3.Post-processing

Figure 3.1 – Basic Structural Representation

Read the image Detect Remove Non-Text


and change it Candidate Text Regions Based On
from RGB to Regions Using Basic Geometric
grayscale. MSER Properties

Display the Merge Text


detected and Regions for Final
extracted text. Detection Result

Figure 3.2 – Block Diagram of Text Detection

3.2. Read the image and change it from RGB to grayscale


I = rgb2gray(RGB) changes over the true color image RGB to the grayscale image I.
When using the rgb2gray function, RGB images are converted to grayscale by
eliminating the hue and saturation information while maintaining the luminance
information. This conversion can be played back on a GPU if you have Parallel
Computing ToolboxTM installed.

Page | 24
Figure 3.3 - Code Representation for conversion of image of rgb to gray

3.3. Detect Candidate Text Regions Using MSER


When used as a text area identifier, MSER includes indicators work. As a result, the
predictable tone of text as the high differentiation of text leads to stable intensity profiles
for text.
By using the detectMSERFeatures function to locate the image's different areas and
plotting the results, you will notice that there are many areas that are not text in close proximity
to the content.

Figure 3.4 – Code representation to detect MSER region

Page | 25
3.4. Remove Non-Text Regions Based On Basic Geometric Properties
Despite the fact that the MSER algorithm opts for the vast majority of the text, it also
identifies numerous other stable districts in the picture that are not text-based. You can
automate the process of eliminating non-text areas using a rule-based approach. It makes
sense, for example, to take advantage of geometric properties of text to sort through non-
text data using simple limits. You could also employ artificial intelligence to prepare a
classifier that differentiates between text and non-text. Most often, a combination of the
two approaches is most effective. By channeling non-text districts according to
geometric properties, this model is based on straightforward rule-based design.

Text and non-text regions can be separated within a geometric context by a number of
geometric properties [2,3] and its subsets:

 In terms of aspect ratioEccentricity


 Number composed of Euler equations
 Extent
 Its solidity.

Use regionprops Consequently, some of these properties will be quantified and their
properties will then be eliminated.

Figure 3.5 - Code representation to use regionprops

Page | 26
Figure 3.6 – Code representation to remove region

3.5. Remove Non-Text Regions Based On Stroke Width Variation


A second basic measurement used to distinguish text from non-text is stroke width.
Stroke width is a measure of how wide the bends and lines that make up a character are.
In general, stroke widths will vary little across text regions, while they will vary much
more across non-text regions. Consider one of the detected MSER regions, its stroke
width, to better to eliminate non-text regions, it is important to understand stroke width.
Distance changes and parallel diminishing activities can be used for this.

Page | 27
Figure 3.7 – Code representation to remove non text regions on stroke width detection

In order to quantify stroke width variation as a solitary measurement in order to remove


non-text regions from an image, the variation over the whole area must be quantified as
follows:

Figure 3.8 – Code representation to show the region alongside stoke width detection

Page | 28
Figure 3.9 – Code representation to compute and find the threshold stroke width
variation metric

Figure 3.10 – Code representation to process the remaining regions

Page | 29
Figure 3.11 – Code Representation to show the remaining text region

3.6. Finalizing the detection result by merging text regions


Therefore, all detection results at this point are composed of individual text characters. It
is necessary to merge the individual text characters into words or lines so that they can be
utilized for recognition tasks. As a result of this, images can be recognized for the words
that they contain rather than for their individual characters themselves. Recognition of
the string 'EXIT' as opposed to the individual letter combinations ['X', 'E', 'T', 'I'], in
which It is impossible to comprehend the correct meaning of the word without the
correct context placement of the letters.

Creating words or text lines by merging individual text regions is one way to begin by
detecting adjacent text regions and then structuring a box around them afterward.
Growing the bouncing boxes with regionprops will allow you to find neighbouring
regions. In effect, this overlaps bouncing boxes of adjacent text regions in such a way as
to make a chain of overlapping jumping boxes of text regions that are essential for
forming similar words or text line structures.

Page | 30
Figure 3.12 – Code representation to get bounded boxes for all regions

Figure 3.13 – Code representation to clip the bounding boxes

Page | 31
3.7. To compute the overlap regions and setting the overlapping ratio

There is currently a way to combine individual words or text lines to form overlapping
bounding boxes. You will need to calculate the proportion of each box set's cover both
within and betIten bounding box sets. This method measures distances betIten all text
collections so that it may be possible to tell if there is a cluster of neighboring text
regions by searching for non-zero cover ratios betIten them. Using a diagram, identify all
text areas that are not covered by zero in the pair-wise cover ratios.

Utilize the bboxOverlapRatio Identifying all the two-sided bounding boxes, using a
diagram to locate the areas associated with them, as Itll as calculating their cover
proportions.

Figure 3.14 – Code representation to compute the overlapping ratio and show graphical
representation

As a consequence of conncomp, you will have text files in the text locales that
correspond to your bounding boxes. By combining the bounding box of each associated
part and the limit of its individual bounding box, you can create a single bounding box
from multiple neighboring bounding boxes.
In conclusion, eliminate bounding boxes made up of only one content area before
displaying the last recognition results to squash bogus content locations. As a result,

Page | 32
confined locales won't represent actual information since text is commonly gathered into
groups (words and sentences).

Figure 3.15 – Code representation to merge the boxes

Figure 3.16 – Code representation to get final bounded boxes

Page | 33
3.8. Discussion
This chapter provide brief knowledge about the methods folloItd to automatically detect
text from a natural scene using learning using MSER technique. Image has been
converted to grayscale from rgb then image was examined and on the basis of
BoundingBox properties of Eccentricity, Solidity, Extent, Euler, Image . on using the the
following functions the non text region is eliminated and and text region is considered ,
the text region is then detected the overlap region is then merged as one and the detected
text will be shown as output . basically here the text region is detected and displayed .
this procedure can be used for many purposes like text-to-speech feature for people who
are unable to read or visualize in dark , for people who are visually impaired , dyslexic
people or people who are in a new country and are unable to understand the regional
country text.

Page | 34
CHAPTER 4
SIMULATION , RESULT AND DISCUSSION

4.1 – Changing RGB TO GRAYSCALE of image


An image in RGB is converted to grayscale using rgbtogray command for easy study of
image which eliminates the hue and saturation while retaining the luminance of the
image which is needed to study. A grayscale image is chosen because the only shade is
gray shade rather than any other colour, the reason to convert the colour image or rgb
image to gray is because differentiating any image from a gray scale image is that less
amount of information is provided to the pixel. Another reason is that a rgb image has
three different layers of image whereas a gray image has one layer image from 0- 255.

Figure 4.1 – Chinese Language converted to grayscale

Figure 4.2 – Oriya Language converted to grayscale

Figure 4.3 – Number converted to grayscale


Page | 35
Figure 4.4 – English Language and number converted to grayscale

Figure 4.5 – Alphabets, Symbol and number converted to grayscale

Figure 4.6 – Sign board converted to grayscale

Page | 36
4.2 – To detect the text region in a natural scene
In any natural scene there are various text region need to be identified for areas where
there is low visibility. So It used MSER text region detection where the algorithm detects
the region of text in a natural scene. The text in an image is detected threshold by area
range of [200,8000] and threshold delta – 4. Using detectMSERFeature returns a
MSERRegions object, regions, containing information about MSER highlights identified
in the 2-D grayscale input image, I. This object utilizes Maximally Stable Extremal
Regions (MSER) algorithm to find regions detectMSERFeature .To detect the MSER
region and plot those areas in the picture is given below in a result.

Figure 4.7 – Chinese Language MSER Region

Figure 4.8 – Oriya MSER Region

Page | 37
Figure 4.9 – Number MSER Region

Figure 4.10 – Alphabet and Number MSER Region

Figure 4.11 –Alphabet, Symbol and Number MSER Region

Page | 38
Figure 4.12 – Sign Board MSER Region

4.3- Remove regions other than text based on basic Geometries


Properties
The text detection in MSER uses regionprops which returns the 'Area','Centroid',
and 'BoundingBox' measurements regionprops measures a set of properties for each
connected components conncomp which returns mserconncomp with measurements
bounding box, eccentricity, solidity, extent, euler, image. Bounding box is estimated by
vertcat which concatenates the dimensions of the regions which is detected by MSER
regionprops . The non text region is filtered from text region by the threshold data from
the above estimation.

Figure - 4.13 – Chinese Language After Removing Non- text Region Based On
Geometric Region
Page | 39
Figure - 4.14 –Odia Language After Removing Non- text Region Based On Geometric
Region

Figure - 4.15–Number After Removing Non- text Region Based On Geometric Region

Figure - 4.16 –Alphabet and Number After Removing Non- text Region Based On
Geometric Region

Page | 40
Figure - 4.17 –Alphabet, Symbol and Number After Removing Non- text Region Based
On Geometric Region

Figure - 4.18 –Sign Board After Removing Non- text Region Based On Geometric
Region

4.4- Remove Non-Text Regions Based On Stroke Width Variation

In a natural scene a non- text region are filtered out using mserstats which is used for
visualisation, clustering and analysis of data. The next step to extract the text region is
padding the binary image of the region to avoid or overcome the boundary effect. To
compute the stroke width image the image distance is calculated using bwdist which
computes the Euclidean distance transform of the binary image BWand a skeletal image
is obtained by bwmorp which applies specific morphological operation to the image BW.
Stroke width variation is used to eliminate the non text region here It define stroke width
variation as stroke can be defined as, the stroke is the darker pixel region in an image the
width variation is the difference betIten the inner boundary and outer boundary of a
darker area of the text pixel . Stroke Width Variation is used in such a way that the text
region variance is calculated and the non text region is removed using this method. The
Page | 41
stoke width variation metric is computed by two factors strokewidthvalues is computed
by distanceimage multiplied by skeleton image and strokewidthmetric is computed by
standard deviation of strokewidthvalues divided by mean of strokewidthvalues. Then a
threshold of stroke width metrices is found with strokewidththreshold that’s a constant
with value 0.4 , stokewidth is filtered by the condition that strokeWidthMetric >
strokeWidthThreshold. The remaining is regions are processed by using a for loop and
then removing the regions using stroke width variation computations and the plotting it
mentioned below.

Figure 4.19 – Chinese Stroke Width Graph and After removing Non text region using
Stroke Width

Figure 4.20 – Oriya Stroke Width Graph and After removing Non text region using
Stroke Width

Page | 42
Figure 4.21 – Number Stroke Width Graph and After removing Non text region using
Stroke Width

Figure 4.22 – Alphabet and Number Stroke Width Graph and After removing Non text
region using Stroke Width

Figure 4.23 – Alphabet, symbol and Number Stroke Width Graph and After removing
Non text region using Stroke Width

Page | 43
Figure 4.24 – Sign Board Stroke Width Graph and After removing Non text region using
Stroke Width

4.5 Merge Text Regions For Final Detection Result


In the above steps It detected the texts in a natural scene but the main motive of out
project was to read out the text which can be obtained by merging the different set of
alphabets in different languages and identify the order of writing. The text regions are
merged and characterized into single rectangular boxes and separated by boxes on spaces
which is to be extracted from various natural scene. Rectangular boxes are found and
concatenated by using vertcat by the command which merges the boxes by the command
mserstats.boundingboxes . The width and length of the boxes denoted by xmin and ymin
is converted to xmax and ymax for the convenience of computation , the bounded boxes
are expanded to a small amount with a value taken as 0.02 as the expansion amount. The
bounding boxes are then clipped or limited to be in the text area so that it can be
distinguished from other non text areas , the bounding boxes are then displayed in the
form of rectangle boxes with expanded boxes of width 3 which can be varied according
to the need. For a better detection the overlapping ratio is computed . then the
overlapping region is set between the bounding box and is set itself to zero for
simplification of plotting the graph, the connected text region is to be detected within the
graph. The boxes are merged on the basis of maximum and minimum dimensions of the
bounding boxes by using the function accumarray which sums groups of data by
accumulating elements of a vector data according to the groups specified in x ,y indices.
The sum is then computed over each group. The values in x and y indices define both the
group the data belongs to and the index into the output array where each group sum is
stored in x(parameter)=accumarray(componentIndices', x(parameter), [ ],
@width_height).The merged boundaries are composed using the x, y width height
Page | 44
format. Finally the texts bounding boxes having single text region is removed and then
the whole process is evaluated and the final result of detected text of a chosen natural
image is displayed.

Figure 4.24 – Chinese Merge final Text Detection

Figure 4.25 – Oriya Merge final Text Detection

Figure 4.26 – Number Merge final Text Detection

Figure 4.27 –Number and alphabets Merge final Text Detection

Page | 45
Figure 4.28 –Number, symbol and alphabets Merge final Text Detection

Figure 4.29– Sign Board Merge final Text Detection

Page | 46
CHAPTER 5
CONCLUSION

A key element of the thesis is the outline of steps completed in order to extract text from
various languages and remove text-free regions to be able to extract text; therefore, this
proposed approach with MSER can be applied in countries where normal vision cannot
distinguish text. Military troops can use this algorithm in drones or high definition
cameras to read out texts in the enemy's territory, and it can be installed on the computer
of the pilot. These modifications will indeed bring revolutionary changes to connecting
the world, allowing the word to match the means for a new generation of people who are
new to a certain non-English region “वसु धैव कुटु म्बकम” which means whole world is my
home. Despite the language barrier, people can connect with the land and understand it
through this basic algorithm. The police department can also use this to extract or read
the number plates offenders' vehicles. MSER - Maximally Stable Extremal Regions is
the mechanism used in this algorithm for text extraction - text extraction can also be
accomplished with other methods, such as a baseline machine that provides a wider
region for stereo matching to take place, which results in better object recognition.

5.1 Future Scope of work


Following points are the future scope of this work mentioned below:-
1) The proposed algorithm can be used to compare with other methods or algorithm and see
which gives a better response or result.
2) The proposed algorithm can be used after the text detection the extracted text can be
translated to the desired language or can be given a voice command so that it will be
handy for the users.

Page | 47
CHAPTER 6
REFERENCES
[1] Sanjay B. Patil et al. “LEAF DISEASE SEVERITY MEASUREMENT USING
IMAGE PROCESSING”, International Journal of Engineering and Technology Vol.3
(5), 2011, 297-301, DOI: 10.1109/ECS.2011.7090754
[2] B. Bhanu, J. Peng, “Adaptive integrated image segmentation and object recognition”,
In IEEE Transactions on Systems, Man and Cybernetics, Part C, volume 30, pages 427–
441, November 2000, DOI: 10.1109/5326.897070
[3] Keri Woods. ”Genetic Algorithms: Colour Image Segmentation Literature Review”,
July 24, 2007, DOI: 10.1109/I2CT.2007.8226184
[4] M. Gomathi, Dr. P. Thangaraj, "A Computer Aided Diagnosis System for Detection
of Lung Cancer Nodules Using Extreme Learning Machine", International Journal of
Engineering Science And Technology Vol.2(10), 2010.
[5] Dilpreet Kaur, Dr. S.K MITTAL “Textual Feature Analysis and Classification
Method for the Plant Disease Detection”
[6] Chandni Kumari, Kamlesh Chandravanshi, Gaurav Soni “Lung Cancer Detection
Using Semantic Based ANN Approach”
[8] J. Friedman, T. Hastie and R. Tibshirani. ”Additive logistic regression: a statistical
view of boosting”, Dept. of Statistics, Stanford University Technical Report. 1998.
[9] S. Gold, A. Rangarajan, C.-P. Lu, S. Pappu, and E. Mjolsness. New algorithms for
2D and 3D point matching: pose estimation and correspondence. Pattern Recognition,
31(8), 1998.
[10] M. Revow, G.K.I. Williams and G.E. Hinton, ”Using generative models for
handwritten digit recognition”, IEEE Trans . PAMI, 18, pp. 592–606, 1996.
[11] A.K. Jain and B. Tu. ”Automatic Text Localization in Images and Video Frames”.
Pattern Recognition. 31(12), pp 2055 - 2076. 1998.
[12] Huiping Li, David Doermann and Omid Kia. ”Automatic Text Detection and
Tracking in Digital Video”.IEEE Transactions on Image Processing, 9(1):147–156,
2000.
[13] K.Matsuo, K.Ueda and M.Umeda, “Extraction of Character String from Scene
Image by Binarizing Local Target Area”, T-IEE Japan, Vol. 122-C(2), 2002, pp.232-
241.

Page | 48
[14] Y. Liu, T. Yamamura, N. Ohnishi and N. Sugie, “Extraction of Character String
Regions from a Scene Image”, IEICE Japan, D-II, Vol. J81, No.4, 1998, pp.641-650.
[15] L. Gu, N. Tanaka, T. Kaneko and R.M. Haralick, “The Extraction of Characters
from Cover Images Using Mathematical Morphology”, IEICE Japan, D-II, Vol. J80,
No.10, 1997, pp. 2696-2704.
[16] J. Yang, J. Gao, Y. Zhang, X. Chen and A. Waibel,“An Automatic Sign Recognition
and Translation System”, Proceedings of the Workshop on Perceptive User Interfaces
(PUI'01), 2001, pp. 1-8.
[17] A. Zandifar, R. Duraiswami, A. Chahine, and L. Davis, “A Video Based Interface to
Textual Information for the Visually Impaired”, IEEE 4th ICMI, 2002, pp.325-330.
[18] N. Otsu, “A Threshold Selection Method from Gray- Level Histogram”, IEEE
Trans. Systems, Man and Cybernetics, Vol. 9, 1979, pp. 62-69.
[19] C. M. Bishop. Neural Network for Pattern Recognition. Oxford University Press,
NY, USA, 1995.
[20] D. Chen, J. M. Odobez, and J. P. Thiran. A localization/ verification scheme for
finding text in images and video frames based on contrast independent features and
machine learning methods. Image Communication, 19(3):205–217, 2004.
[21] X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. CVPR,
2:366–373, 2004.
[22] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.
CVPR, 1:886– 893, 2005.
[23] P. Dollar, Z. Tu, H. Tao, and S. Belongie. Feature mining for image classification.
CVPR, pages 1–8, 2007.
[24] N. Ezaki, M. Bulacu, and L. Schomaker. Text detection from natural scene images :
Towards a system for visually impaired persons. ICPR, 2:683–686, 2004.
[25] Y. Freund and R. Schapire. Experiments with a new boosting algorithm. ICML,
pages 148–156, 1996.
[26] S. M. Hanif, L. Prevost, and P. A. Negri. A cascade detector for text detection in
natural scene images. ICPR, pages 1–4, 2008.
[27] R. M. Haralick, K. Shanmugam, and I. Dinstein. Textual features for image
classification. Systems, Man, and Cybernetics, SMC-3(6):1973, 1973.
[28] ICDAR. Icdar 2003 robust reading and locating database.
http://algoval.essex.ac.uk/icdar/RobustReading.html, 2003.

Page | 49
[29] M. Lalonde and L. Gagnon. Key-text spotting in documentary videos using
adaboost. Symposium on Electronic Imaging, Science and Technology, 38:1N–1–1N–8,
2006.
[30] J. Liang, D. Doermann, and L. Huiping. Camera-based analysis of text and
documents : A survey. IJDAR, 7:84–104, 2005.
[31] S. M. Lucas. Icdar 2005 text locating competition results. ICDAR, 1:80–84, 2005.
[32] B. McCane and K. Novins. On training cascade face detectors. Image and Vision
Computing New Zealand, pages 239–244, 2003.
[33] P. Viola and M. J. Jones. Robust real-time face detection. Computer Vision,
57(2):137–154, 2004.
[34] C. Wolf and J.-M. Jolion. Object count/area graphs for the evaluation of object
detection and segmentation algorithms. IJDAR, 8(4):280–296, 2006.
[35] V. Wu, R. Manmatha, and E. M. Risemann. Text finder: An automatic system to
detect and recognize text in images.PAMI, 21(11):1224–1228, 1999.
[36] A. Jain, M. Murty, and P. Flynn, “Data clustering: A review,” ACM Comput. Surv.,
vol. 31, no. 3, pp. 264–323, 1999.
[37] M. Bilenko, S. Basu, and R. J. Mooney, “Integrating constraints and metric learning
in semi-supervised clustering,” in Proc. Int. Conf. Mach. Learn., Banff, AB, Canada,
2004, pp. 81–88.
[38] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance metric learning, with
application to clustering with side-information,” in
Advances in Neural Information Processing Systems 15. MIT Press, 2002, pp. 505–512.
[39] D. Klein, S. D. Kamvar, and C. D. Manning, “From instance-level constraints to
space-level constraints: Making the most of prior knowledge in data clustering,” in Proc.
Int. Conf. Mach. Learn., San Francisco, CA, USA ,2002, pp. 307–314.
[40] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction, 2nd ed. Springer- Verlag, 2009.
[41] J. Nocedal, “Updating Quasi-Newton matrices with limited storage,” Math.
Comput., vol. 35, no. 151, pp. 773–782, 1980.
[42] D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy,“ICDAR 2011
robust reading competition challenge 1: Reading text in born-digital images (Itb and
email),” in Proc. ICDAR,2011, pp. 1485–1490.
[43] T. Q. Phan, P. Shivakumara, and C. L. Tan, “Detecting text in the real world,” in
Proc. ACM Int. Conf. MM, New York, NY, USA, 2012, pp. 765–768.

Page | 50
[44] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in
Proc. IEEE Int. Conf. Comput. Vis., Barcelona,
Spain, 2011, pp. 1457–1464.
[45] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations
in natural images,” in Proc. IEEE Conf. CVPR,
Providence, RI, USA, 2012, pp. 1083–1090.
[46] S. Lucas et al., “ICDAR 2003 robust reading competitions: Entries, results and
future directions,” IJDAR, vol. 7, no. 2–3, pp. 105–122,2005.
[47] A. Vedaldi and B. Fulkerson, VLFeat: An Open and Portable Library of Computer
Vision Algorithms [Online]. Available: http://www.vlfeat.org/, 2008.
[48] H. I. Koo and D. H. Kim, “Scene text detection via connected component clustering
and nontext filtering,” IEEE Trans. Image
Process., vol. 22, no. 6, pp. 2296–2305, June 2013.
[49] D. Karatzas et al., “ICDAR 2013 robust reading competition,” in Proc. ICDAR,
Washington, DC, USA, 2013, pp. 1115–1124.

Page | 51

You might also like